My dad was a Navy carrier pilot who flew a jet called the S-3 Viking. Google it, you’ll find tons of real images like this one:
But this is what GPT (and the other image generators as well) think it looks like:
This is a ridiculous amalgamation of features from several earlier aircraft. By comparison, the image generators are of course very successful at rendering that one we ALL know from the movies - the F-14 Tomcat:
Image generators trained by looking at millions of pictures will be much better at creating images of things for which there are more pictures. This means that we will end up creating new media with them that is more likely to feature more commonly photographed things.
Put another way, we will make AI action movies with Tomcats, not Vikings. With Martin Luther King, but not Malcolm X. With Picasso paintings, but not Mondrian.
As pro-social mammals who value cooperation and therefore conformity, we are already strongly drawn to averages. The most attractive human faces, for example, are those that are closest to the average shape of all other faces. AI will make this worse, by learning best only the most representative samples and regurgitating them to us most frequently. This is similar to the mistake we’ve made with YouTube and Instagram, where the most commonly-viewed media is recommended most, creating a steeply peaked power-law distribution in which there are only a few big winners like Taylor Swift and Mr. Beast. They aren’t actually that much better performers - we artificially made them that way through recommendations. Access to the ‘long tail’ of Vikings and Mondrians will be made all the more inaccessible if we train on all our historical media.
The long-term solution is to create new kinds of AIs that do not train from these single large human-curated datasets, and instead grow up among us with separate experiences and interests. This will require new types of foundation models that do not require large training sets.
I’m not as sure what the short-term solutions are. I like how Midjourney just released personalization, which uses your personal ratings on images (rather than everyone else’s) to tune your outputs… this is a good example of actively working against homogenization. Or maybe an “I’m feeling lucky” button like the one on Google search. But we don’t use that one much, do we?
My vote is for an amended version of your previous innovation. This one would be called “The Luck Machine”.
😉
Maybe – somehow – LLMs / ML models need to "go to university", rather than educating themselves on content scraped from the web.
Also I think they need to learn how to say "I don't know" rather than always confidently presenting a result even when the score was low. I'm sure both these problems are being worked on!