Why AI-Generated Images Look Like the Stuff of Nightmares
In the summer of 2015, Google’s AI Blog introduced the world to one of the most famous and influential figures in the history of artistic AI: the puppy-slug.
In truth, the puppy-slug is not one image, nor is it even a single character. Rather, “puppy-slug” refers to the unsettling psychedelia of DeepDream, an image generation tool developed by researchers at Google. This tool uses convolutional neural networks to train models to recognize features of a given entity that is present within the dataset. If researchers train the model by feeding it millions of pictures of dogs, for example, the network will develop an ability to discern what a (picture of a) dog looks like.
Neural networks already had the capacity to do this before DeepDream, but what made this project so special was that it used this technology not only to analyze and classify image data but also to generate images of its own. These early examples from DeepDream were produced by training a convolutional network on images of dogs, then asking the system to increase the “dog-ness” of other images that did not contain dogs. The models analyzed images to find the regions of pixels that were most similar to a dog and then modified surrounding pixels to draw out the dog-like qualities it identified.
As aesthetic artifacts, puppy-slugs and other images generated with DeepDream are repulsive. Even when they don’t trigger my trypophobia, they still give me the vertiginous sensation that happens when my mind tries to reconcile familiar features and forms in unnatural, physically impossible arrangements. Looking at these images, I feel like I’ve been poisoned by some forbidden mushroom or the secretions of some noxious toad. I’m a Lovecraft protagonist going mad from exposure to extra-dimensional beings. In other words, they’re really gross looking!
DeepDream provokes a question that is perhaps even more disturbing than the raw abjection of the images: is this really how AIs see the world?
When these pictures first made the rounds online, I remember many conversations with friends who were shocked and scandalized by that thought. Many people assumed that a computer’s imagination, if you could call it that, would be precise, literal, and maybe even a little bit boring. We were not expecting to see such vivid hallucinations and organic-seeming shapes.
But of course, the images from DeepDream were not really showcasing the machines’ imaginations, at least not in the sense that provoked such visceral fear among some members of the public. What DeepDream actually shows is more akin to a data visualization. DeepDream offers a way for us to peer into the “black box” that characterizes the process of training a convolutional network.
The reason some of these images look so frightening is the same reason they are not in reality — these models don’t actually “know” anything, at least not the way we use the word.
These images are products of some computationally advanced algorithms and calculators that can track and compare pixel values. They’re able to spot and reproduce trends from their training data, but they aren’t equipped to make sense of what they’re given. If they were, they would know for instance dogs usually have two eyes, and typically only one face per head. If machines are capable of self-directed creative thought, they’re playing it pretty close to the vest.
You could be forgiven for thinking otherwise, especially given the impressive results that have been generated recently with OpenAI’s Dall-E. The images generated by this model are worlds beyond what DeepDream was capable of doing; from a technological perspective, it really is incredible.
But when interpreting these Dall-E pieces as art, it’s helpful to keep the old Arthur C. Clarke adage in mind: “Any sufficiently advanced technology is indistinguishable from magic.” The magic of Dall-E involves a tremendous amount of mathematics, computer science, processing power, and countless hours of work from the researchers that produced it. The team at OpenAI did an incredible job, and we should applaud them for it.
Dall-E, and tools like it, work by matching words and phrases to vast stores of image data, which are then used to train generative models. The process of matching text input to the correct images requires that someone make decisions about how to sort and define the images. The people who make these decisions are the untold millions of low-wage data entry professionals around the world, content creators optimizing images for SEO, and anyone who has ever used a Captcha to access a website. Like the artisans who worked on the great cathedrals of the middle ages, these people could live and die without ever receiving credit for their work, even though the project would literally not exist without their contributions.
Images generated in this manner are less like paintings than they are like mirrors, reflecting our own views and values back to us, albeit through a very elaborate prism. For this reason, we need to be wary when we look at these pictures of the limits and prejudices contained that these models show.
Artist Mimi Onuoha stated the problem eloquently in her essay “On Algorithmic Violence”:
As we continue to see the rise of algorithms being used for civic, social, and cultural decision-making, it becomes that much more important that we name the reality that we are seeing.9 Not because it is exceptional, but because it is ubiquitous. Not because it creates new inequities, but because it has the power to cloak and amplify existing ones. Not because it is on the horizon, but because it is already here.
This is part of a 4 part series on visual trends and aesthetics of AI Art, if you liked this one, check out the first article in the series Visual Trends in AI Art.