Imagen 2 – Google DeepMind


Expertise
Our most superior text-to-image know-how
Imagen 2 is our most superior text-to-image diffusion know-how, delivering high-quality, photorealistic outputs which can be intently aligned and according to the person’s immediate. It may well generate extra lifelike photographs by utilizing the pure distribution of its coaching knowledge, as an alternative of adopting a pre-programmed fashion.
Imagen 2’s highly effective text-to-image know-how is offered for builders and Cloud prospects by way of the Imagen API in Google Cloud Vertex AI.
The Google Arts and Tradition group can be deploying our Imagen 2 know-how of their Cultural Icons experiment, permitting customers to discover, be taught and check their cultural data with the assistance of Google AI.

Immediate: A shot of a 32-year-old feminine, up and coming conservationist in a jungle; athletic with brief, curly hair and a heat smile

Immediate: A jellyfish on a darkish blue background

Immediate: Small canvas oil portray of an orange on a chopping board. Mild is passing by way of orange segments, casting an orange mild throughout a part of the chopping board. There’s a blue and white fabric within the background. Caustics, bounce mild, expressive brush strokes
Improved image-caption understanding
Textual content-to-image fashions be taught to generate photographs that match a person’s immediate from particulars of their coaching datasets’ photographs and captions. However the high quality of element and accuracy in these pairings can range extensively for every picture and caption.
To assist create higher-quality and extra correct photographs that higher align to a person’s immediate, additional description was added to picture captions in Imagen 2’s coaching dataset, serving to Imagen 2 be taught totally different captioning kinds and generalize to higher perceive a broad vary of person prompts.
These enhanced image-caption pairings assist Imagen 2 higher perceive the connection between photographs and phrases — rising its understanding of context and nuance.
Listed below are examples of Imagen 2’s immediate understanding:

Immediate: “Mushy purl the streams, the birds renew their notes, And thru the air their mingled music floats.” (A Hymn to the Night by Phillis Wheatley)

Immediate: “Contemplate the subtleness of the ocean; how its most dreaded creatures glide beneath water, unapparent for probably the most half, and treacherously hidden beneath the loveliest tints of azure.” (Moby-Dick by Herman Melville)

Immediate: ”The robin flew from his swinging spray of ivy on to the highest of the wall and he opened his beak and sang a loud, beautiful trill, merely to indicate off. Nothing on the earth is kind of as adorably beautiful as a robin when he exhibits off – and they’re practically at all times doing it.” (The Secret Backyard by Frances Hodgson Burnett)
Extra real looking picture technology
Imagen 2’s dataset and mannequin advances have delivered enhancements in lots of the areas that text-to-image instruments typically battle with, together with rendering real looking arms and human faces and maintaining photographs freed from distracting visible artifacts.
Examples of Imagen 2 producing real looking arms and human faces.
We educated a specialised picture aesthetics mannequin based mostly on human preferences for qualities like good lighting, framing, publicity, sharpness, and extra. Every picture was given an aesthetics rating which helped situation Imagen 2 to offer extra weight to photographs in its coaching dataset that align with qualities people desire. This system improves Imagen 2’s potential to generate higher-quality photographs.
AI-generated photographs utilizing the immediate “Flower”, with decrease aesthetics scores (left) to greater scores (proper).
Fluid fashion conditioning
Imagen 2’s diffusion-based methods present a excessive diploma of flexibility, making it simpler to regulate and modify the fashion of a picture. By offering reference fashion photographs together with a textual content immediate, we are able to situation Imagen 2 to generate new imagery that follows the identical fashion.
A visualization of how Imagen 2 makes it simpler to regulate the output fashion by utilizing reference photographs alongside a textual content immediate.
Superior inpainting and outpainting
Imagen 2 additionally permits picture modifying capabilities like ‘inpainting’ and ‘outpainting’. By offering a reference picture and a picture masks, customers can generate new content material immediately into the unique picture with a way referred to as inpainting, or prolong the unique picture past its borders with outpainting. This know-how is deliberate for Google Cloud’s Vertex AI within the new 12 months.
Imagen 2 can generate new content material immediately into the unique picture with inpainting.
Imagen 2 can prolong the unique picture past its borders with outpainting.
Accountable by design
To assist mitigate the potential dangers and challenges of our text-to-image generative know-how, we set sturdy guardrails in place, from design and improvement to deployment in our merchandise.
Imagen 2 is built-in with SynthID, our cutting-edge toolkit for watermarking and figuring out AI-generated content material, enabling allowlisted Google Cloud prospects so as to add an imperceptible digital watermark immediately into the pixels of the picture, with out compromising picture high quality. This permits the watermark to stay detectable by SynthID, even after making use of modifications like filters, cropping, or saving with lossy compression schemes.
Earlier than we launch capabilities to customers, we conduct sturdy security testing to attenuate the chance of hurt. From the outset, we invested in coaching knowledge security for Imagen 2, and added technical guardrails to restrict problematic outputs like violent, offensive, or sexually express content material. We apply security checks to coaching knowledge, enter prompts, and system-generated outputs at technology time. For instance, we’re making use of complete security filters to keep away from producing probably problematic content material, similar to photographs of named people. As we’re increasing the capabilities and launches of Imagen 2, we’re additionally constantly evaluating them for security.
Acknowledgements
This work was made potential by key analysis and engineering contributions from:
Aäron van den Oord, Ali Razavi, Benigno Uria, Çağlar Ünlü, Charlie Nash, Chris Wolff, Conor Durkan, David Ding, Dawid Górny, Evgeny Gladchenko, Felix Riedel, Cling Qi, Jacob Kelly, Jakob Bauer, Jeff Donahue, Junlin Zhang, Mateusz Malinowski, Mikołaj Bińkowski, Pauline Luc, Robert Riachi, Robin Strudel, Sander Dieleman, Tobenna Peter Igwe, Yaroslav Ganin, Zach Eaton-Rosen.
Due to: Ben Bariach, Daybreak Bloxwich, Ed Hirst, Elspeth White, Gemma Jennings, Jenny Brennan, Komal Singh, Luis C. Cobo, Miaosen Wang, Nick Pezzotti, Nicole Brichtova, Nidhi Vyas, Nina Anderson, Norman Casagrande, Sasha Brown, Sven Gowal, Tulsee Doshi, Will Hawkins, Yelin Kim, Zahra Ahmed for driving supply; Douglas Eck, Nando de Freitas, Oriol Vinyals, Eli Collins, Demis Hassabis for his or her recommendation.
Thanks additionally to many others who contributed throughout Google DeepMind, together with our companions in Google.