OpenAI researchers have created a new system that can produce a full image, including of an astronaut riding a horse, from a simple plain English sentence.
Known as DALL·E 2, the second generation of the text to image AI is able to create realistic images and artwork at a higher resolution than its predecessor.
The artificial intelligence research group won't be releasing the system to the public.
The new version is able to create images from simple text, add objects into existing images, or even provide different points of view on an existing image.
Developers imposed restrictions on the scope of the AI to ensure it could not produce hateful, racist or violent images, or be used to spread misinformation.
Its original version, named after Spanish surrealist artist Salvador Dali, and Pixar robot WALL-E, was released in January 2021 as a limited test of ways AI could be used to represent concepts - from boring descriptions to flights of fancy.
Some of the early artwork created by the AI included a mannequin in a flannel shirt, an illustration of a radish walking a dog, and a baby penguin emoji.
Examples of phrases used in the second release - to produce realistic images - include 'an astronaut riding a horse in a photorealistic style'.
On the DALL-E 2 website, this can be customized, to produces images 'on the fly', including replacing astronaut with teddy bear, horse with playing basketball and showing it as a pencil drawing or as an Andy Warhol style 'pop-art' painting.
‹ Slide me ›
It can add or remove objects from an image - such as the flamingo seen in the first image, and gone in the second
Satisfying even the most difficult client, with never ending revision requests, the AI can pump out multiple versions of each image from a single sentence.
One of the specific features of DALL-E 2 allows for 'inpainting', that is where it can take an existing picture, and add other features - such as a flamingo to a pool.
It is able to automatically fill in details, such as shadows, when an object is added, or even tweak the background to match, if an object is moved or removed.
'DALL·E 2 has learned the relationship between images and the text used to describe them,' OpenAI explained.
'It uses a process called “diffusion,” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image.'
DALL-E 2 is built on a computer vision system called CLIP, developed by OpenAI and announced last year.
“DALL-E 1 just took our GPT-3 approach from language and applied it to produce an image: we compressed images into a series of words and we just learned to predict what comes next,” OpenAI research scientist Prafulla Dhariwal, told The Verge.
Unfortunately this process limited the realism of the images, as it didn't always capture the qualities humans found most necessary.
CLIP looks at an image and summarizes the contents in the same way a human would, and they flipped this around - unCLIP - for DALL-E 2.
OpenAI trained the model using images, and they weeded out some objectional material, limiting its ability to produce offensive content.
Each image also includes a watermark, to show clearly that it was produced by AI, rather than a person, or that it is an actual photo - reducing misinformation risk.
It also can't generation recognizable faces based on a name, even those only recognizable from artworks such as the Mona Lisa - creating distinctive variations.
'We’ve limited the ability for DALL·E 2 to generate violent, hate, or adult images,' according to OpenAI researchers.
'By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts.
The AI has been restricted to avoid directly copying faces, even those in artwork such as the Girl in the Pearl Earring by Dutch Golden Age painter Johannes Vermeer. Seen on the right is the AI version of the same painting, changed to not directly mimic the face
'We also used advanced techniques to prevent photorealistic generations of real individuals’ faces, including those of public figures.'
While it won't be publicly available, some researchers will be granted access, and in future it could be embedded in other applications - requiring strict content policies.
This does not allow users to generate violent, adult, or political content, among other categories.
'We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse,' a spokesperson explained.
'We’ve been working with external experts and are previewing DALL·E 2 to a limited number of trusted users who will help us learn about the technology’s capabilities and limitations.
'We plan to invite more people to preview this research over time as we learn and iteratively improve our safety system.'
HOW ARTIFICIAL INTELLIGENCES LEARN USING NEURAL NETWORKS
AI systems rely on artificial neural networks (ANNs), which try to simulate the way the brain works in order to learn.
ANNs can be trained to recognise patterns in information - including speech, text data, or visual images - and are the basis for a large number of the developments in AI over recent years.
Conventional AI uses input to 'teach' an algorithm about a particular subject by feeding it massive amounts of information.
Practical applications include Google's language translation services, Facebook's facial recognition software and Snapchat's image altering live filters.
The process of inputting this data can be extremely time consuming, and is limited to one type of knowledge.
A new breed of ANNs called Adversarial Neural Networks pits the wits of two AI bots against each other, which allows them to learn from each other.
This approach is designed to speed up the process of learning, as well as refining the output created by AI systems.
Advertisement