Revision as of 16:37, 28 November 2022 by Student3MA279F22 (Talk | contribs)

AI Art
alt text
The AI art that won the Colorado State Fair

The Birth of AI Art and Quick Rundown
Before we can explore how AI art came to fruition, let's explore what AI art is. AI art refers to art that is generated with the help of artificial intelligence. The field of artificial intelligence focuses on training models powered by algorithms that try to emulate human intelligence. These models are trained by funneling large amounts of data into complex mathematical equations with the end goal of the model “teaching itself” how to properly generate the desired output. But for the model to generate a photo of whatever topic we desire, the model needs to be trained with large amounts of data (images etc) with their corresponding topics to train the model on what is “a desirable output”. AI art got its big start in 2014 when a researcher published a paper on generative adversarial networks (GANs) and their impact on AI. The researcher, Ian Goodfellow, coined the term GAN in an essay in 2014 theorizing that the machine learning models, GANs, could be the next step in neural networks because they could be used to produce completely new images. Without getting too technical, GANs work in a two step process. First, the “generative” step, where the algorithm attempts to create something, for example an image of a flower. The second step,the “adversarial” part, where a second algorithm that has learned how to differentiate between two things, for example images of flower and images of non-flowers, gives feedback on if the image looks like a flower. These steps continue until the second algorithm is fooled, for example when it can’t tell the difference between images of flowers created by the algorithm and real images of flowers. Once it was shown that computers could generate real looking images, it was only a matter of time until the AI art now taking the world by storm would be created.


How is AI Art Generated?

  • General steps:
    • Text is processed and turned into usable data by the model, and the semantic elements are analyzed as well.
    • Text goes through a “diffusion process”, which essentially adds noise to the text, making it so the art generator will make a different thing every time.
    • AI uses the new data to create the art. The art is usually made using GAN(Generative adversarial networks), which is a style of algorithm which is made up of two parts, one part which creates something, and another part which judges it. The GAN creates a feedback loop where the creator makes a piece of art, then the judge decides whether or not the thing matches the prompt, and if it doesn’t the creator tries again based on the feedback and the process repeats until the judge decides that it matches the prompt.
    • The model will then take the k best images, where k is a number provided to the model, and how good an image is is determined by the judge, and provides the images as output.
  • Additional Information:
    • Usually this process takes about 5 minutes on an average computer, so there are websites that do all the backend processing for you, making it a lot faster.
    • Models typically have at least 2 parts, the creator and judge, but sometimes they can have more. For example, a model might have an image encoder to turn images into numbers, a part that turns text into images(creator), and a part that judges the images(judge).
    • The process of transferring the art style of one piece to another piece is called NST(neural style transfer).


Specific AI Art Example
Corgi head.jpg
“a corgi’s head depicted as the explosion of a nebula”
Here is an example of a piece of AI art created by the DALL-E 2 modal created by OpenAI. The process used to create it is illustrated in the graphic below.
Whole model.jpg
Let’s take a closer look at exactly how DALL-E 2 goes about creating this image from the text caption. There are 3 main steps: encoding the text caption, translating this encoding into an image encoding, and decoding the image encoding into an image. The first step, encoding the text caption, consists of taking the text and encoding it into a vector that represents the text. This vector points to a place in a latent space where points that are closer to each other represent similar text captions. So for example, the vectors representing “dog” and “cat” would be closer to each other than the vectors representing “dog” and “building”. Then the text encoding is turned into an image encoding, which is another vector that represents an image, again where vectors that are closer to each other represent similar images. This step is done through the use of CLIP, another model by OpenAI, which is trained to relate text encoding and image encodings such that we can determine how well a text encoding matches an image encoding. The model then starts with an average image encoding and then over time updates it such that it gets closer and closer to the text encoding that we provide. Finally, this image encoding is decoded into an actual image. This is done through the use of something called a diffusion model, specifically, a model called GLIDE which was also developed by OpenAI. In essence, what a diffusion model does is takes an “average” image, one that looks like colorful static, and step by step transforms it into an image that we would recognize. An example of this is shown below.
Diffusion model.png
This diffusion model is trained by slowly turning an image into noise and then learns to turn it back into the original image. We can control what image it creates by conditioning each step on the data from the image encoding we created. Condition in the statistical sense, ie given this image encoding, what should the image look more like. As you can see, at each step the image gets slightly less noisy until in the end we are left with a coherent image. This step is also one of the reasons why DALL-E 2 can generate multiple different images from the same text prompt. The reason is that this diffusion step is not deterministic, that is to say, if you run the process multiple times you will get a different result every time you run it. While DALL-E 2 might seem like a monolithic entity, as you can see, it is actually composed of a number of smaller models that work together to create its result.


Effects of AI Art on Society

Perpetuates existing biases and stereotypes
Most algorithms that power AI art use data pulled from the internet, and the internet is a place filled with biases. The algorithms then generate images that have certain biases depending on the prompt. Taking DALL-E 2 as an example, the prompt “restaurant” will generate images that depict a western setting and styles and the prompt “nurse” will generate images of people who are female-passing. Generating these types of images will further proliferate biased images on the internet, creating a feedback loop and homogenizing AI art as a whole.

Copyright issues with using artwork in ai training data
Since the models pull images from the internet, innumerable artworks are used without the artists’ permission. This can lead to images that are generated directly in an artist’s art style; all that is needed is to put the artist’s name in the prompt. Also, since a model can easily create mimetic art, if a company sees an image they want to use, they can input it into the database of an AI art model and quickly generate a similar image while bypassing copyright laws. The solution to the legal issues is currently not clear.

Impact on the art industry
The increase in prevalence of AI art may eliminate the need for artists since people can generate art rather than hire an artist. AI art undermines the creative process of traditional artists since it generates art with a click of a button. However, just like the invention of the camera or photoshop, AI generated art can be used as a tool; artists can use AI art as a starting point for creativity rather than a finished product. It may decrease the entry point of artists since it is much easier to have a working product from the beginning. The effects of ai art on the art industry will be determined by the users themselves, not the technology.

Alumni Liaison

Ph.D. 2007, working on developing cool imaging technologies for digital cameras, camera phones, and video surveillance cameras.

Buyue Zhang