Testing DALL-E 2's Mathematics Comprehension
I've been playing around with OpenAI's DALL-E 2 lately; it's a lot of fun. The primary difference between DALL-E and a human artist is that the human's limitation is implementation, while DALL-E's limitation is understanding. In terms of artistic skill, DALL-E is far beyond any human. It can effortlessly create images in pretty much any style: photorealism, minimalism, surrealism, impressionism, and many more; a much broader gamut than any human artist. But what it fails at is something any human can do effortlessly: understand the description it's given.
I was playing around with it recently, and discovered that some of the things it seems to have the most trouble with are extremely simple concepts, like numbers and shapes. I'd like to see how fundamental these limitations are; is it having trouble understanding unclear descriptions, or is struggling with the abstract concepts those words represent? Let's find out.
(For all of these examples, I used the first 6 images it gave me for each description I tried, I didn't cherry-pick any of these results. I did omit a few queries I found uninteresting, but none of them demonstrated DALL-E being particularly good at anything I was asking it to do. A few of them I didn't think of until after later, so they have 4 images instead of 6.)
The first hurdle is getting it to understand I'm talking about math concepts at all. If I just ask it for a square, it thinks I mean the kind where people gather together.
What I'd like is to get it into the mindset of a geometry textbook (or Tron movie), with extremely simple bright-color representations of idealized mathematical objects. What if I just ask it for that?
This doesn't bode well. I don't think it knows what a square is.
Adding color to the description helped in my last experiments, so maybe that will help here?
Well, that is a hexagon. It insists on adding more details to the image, but whatever. Can we keep it up with harder shapes?
I am enjoying these fake polygon names though.
Would it do better with real world objects?
Aha! This seems to be working. Let's try the hexagon.
Aaaand, that's not a pentagon.
Can I specify a number of sides?
Ok, the word "polygon" might be too esoteric. I imagine there weren't a lot of instances of that in its training data.
This is not working. Does it even understand numbers?
That is 5 arms, but as the default number of starfish arms, I don't think DALL-E deserves much credit for that.
(Wait... does one of those have 6 arms?)
Can it parrot back numbers it will have seen in a consistent context?
Most of those are correct numerals, but not at all in the right order.
I'm actually surprised it's this bad. Surely its training images included many examples of numbers in ascending order? In images 1, 3, and 6 it did manage to put smaller numbers before bigger numbers, which I doubt happened by chance. So it might have some rudimentary understanding going on there. I tried a few more prompts that attempted to get it to put numerals in order and they all failed, up until this one:
It consistently puts 1 in front, followed by some more low digits, and then bigger digits later. I think there's something here!
What if I let it pick the numbers?
It did give us one result with a clear number. But uh... are those disembodied dog heads lying on the ground?
Yes. Yes they are. Moving on.
While significantly less gruesome, this is no more accurate.
It doesn't seem to know how to draw lists of numbers? That's a little surprising given its success with the shopping list.
Ok! Those do all appear to be equally full. But given its abysmal previous results, I'm not optimistic about this.
That ended about as quickly as it began.
This isn't really a math question, but out of curiosity:
DALL-E appears to be a realist.
Anyway, I'm about ready to give up on numbers greater than 2. How about some more visual math concepts?
Well, we've lost our math context. There's also no sphere in sight.
Oh come on. This should not be this hard.
Ok, I can't blame DALL-E too much for failing this one.
THAT'S JUST A LOOP.
This one's interesting. It seems to realize that a tetrahedron should have triangular faces and come to a point at the top, but doesn't grasp the entire shape.
Not exactly what I was going for, but it did understand that I was referring to a die. It uses the Arabic numeral for 4, but then the other faces are still pips.
Are some of those shapes even possible? This is fascinating. It knows what a photo of a die looks like in an artistic sense, but has no understanding of the physical object whatsoever.
This gives me an idea.
Not bad, but DALL-E is no M.C. Escher.
I'll give it partial credit for this, the one in the top left is correct.
In desperation, I asked GPT-3 for help. Surely if anyone could write a description that DALL-E can understand, it would be another AI, right?
Clearly our textbook illustrators are not going to be out of a job any time soon. Is this just because DALL-E wasn't trained on data that showed it relationships between numbers and shapes? (After all, GPT-3 learned to do rudimentary arithmetic.) I'm sure that a larger corpus of geometric images could help a future image model do a little better when asked to generate the same type of object as was in its data, but could it learn to generalize outside that distribution in any relevant way? I don't know.
As it currently stands, DALL-E is truly an amazing artist, but it's a pretty terrible intelligence.
I also tried testing GPT-3 on similar questions, which you can see here.