My dad was a Navy carrier pilot who flew a jet called the S-3 Viking. Google it, you’ll find tons of real images like this one:
But this is what GPT (and the other image generators as well) think it looks like:
This is a ridiculous amalgamation of features from several earlier aircraft. By comparison, the image generators are of course very successful at rendering that one we ALL know from the movies - the F-14 Tomcat:
Image generators trained by looking at millions of pictures will be much better at creating images of things for which there are more pictures. This means that we will end up creating new media with them that is more likely to feature more commonly photographed things.
Put another way, we will make AI action movies with Tomcats, not Vikings. With Martin Luther King, but not Malcolm X. With Picasso paintings, but not Mondrian.
As pro-social mammals who value cooperation and therefore conformity, we are already strongly drawn to averages. The most attractive human faces, for example, are those that are closest to the average shape of all other faces. AI will make this worse, by learning best only the most representative samples and regurgitating them to us most frequently. This is similar to the mistake we’ve made with YouTube and Instagram, where the most commonly-viewed media is recommended most, creating a steeply peaked power-law distribution in which there are only a few big winners like Taylor Swift and Mr. Beast. They aren’t actually that much better performers - we artificially made them that way through recommendations. Access to the ‘long tail’ of Vikings and Mondrians will be made all the more inaccessible if we train on all our historical media.
The long-term solution is to create new kinds of AIs that do not train from these single large human-curated datasets, and instead grow up among us with separate experiences and interests. This will require new types of foundation models that do not require large training sets.
I’m not as sure what the short-term solutions are. I like how Midjourney just released personalization, which uses your personal ratings on images (rather than everyone else’s) to tune your outputs… this is a good example of actively working against homogenization. Or maybe an “I’m feeling lucky” button like the one on Google search. But we don’t use that one much, do we?
Excellent point! I think in one of your conversations, maybe the one with Avi Bar-Zeev, the idea is expressed that that LLM companies should pay human creators because if people stop making new things, AI output will get old, stale, and repetitive. Homogenized.
About two years ago I tried a test with Midjourney. I presumed it would be excellent at "Girl in a bikini sitting on a Ferrari," and it was. But I guessed that it would be poor at "Guatemalan refugee family detained at the US-Mexico border by United States Border Patrol agents." To my surprise, it generated a series of beautiful and thoughtful images. Even down to the grainy B&W Tri-X film I asked it to use.
Using AI to generate commercial images seems obvious. If it's faster and cheaper, it's likely inevitable. Using AI to generate photojournalistic or documentary images is ridiculous. Even if it is (surprisingly) good at it, it has no meaning (other than propaganda and disinformation). Nonetheless, I was impressed at how good Midjourney was at creating a "less obvious than a Ferrari" image.
What fascinated me most about your example images, more than that the AI was better at a Tomcat than your dad's plane, is that it's rendering of your dad's plane was "Top Gun Heroic" vs the original image you shared which was more everyday. We can probably dial the "Hollwoodification" down, still AI may take us deeper into unreal fantasy.
My vote is for an amended version of your previous innovation. This one would be called “The Luck Machine”.
😉