Now Reading
Imagen 2 – Google DeepMind

Imagen 2 – Google DeepMind

2023-12-13 09:07:05

A collage of images generated by Imagen


Our most superior text-to-image know-how

Imagen 2 is our most superior text-to-image diffusion know-how, delivering high-quality, photorealistic outputs which can be intently aligned and according to the person’s immediate. It may well generate extra lifelike photographs by utilizing the pure distribution of its coaching knowledge, as an alternative of adopting a pre-programmed fashion.

Imagen 2’s highly effective text-to-image know-how is offered for builders and Cloud prospects by way of the Imagen API in Google Cloud Vertex AI.

The Google Arts and Tradition group can be deploying our Imagen 2 know-how of their Cultural Icons experiment, permitting customers to discover, be taught and check their cultural data with the assistance of Google AI.

Improved image-caption understanding

Textual content-to-image fashions be taught to generate photographs that match a person’s immediate from particulars of their coaching datasets’ photographs and captions. However the high quality of element and accuracy in these pairings can range extensively for every picture and caption.

To assist create higher-quality and extra correct photographs that higher align to a person’s immediate, additional description was added to picture captions in Imagen 2’s coaching dataset, serving to Imagen 2 be taught totally different captioning kinds and generalize to higher perceive a broad vary of person prompts.

These enhanced image-caption pairings assist Imagen 2 higher perceive the connection between photographs and phrases — rising its understanding of context and nuance.

Listed below are examples of Imagen 2’s immediate understanding:

Extra real looking picture technology

Imagen 2’s dataset and mannequin advances have delivered enhancements in lots of the areas that text-to-image instruments typically battle with, together with rendering real looking arms and human faces and maintaining photographs freed from distracting visible artifacts.

Examples of Imagen 2 producing real looking arms and human faces.

We educated a specialised picture aesthetics mannequin based mostly on human preferences for qualities like good lighting, framing, publicity, sharpness, and extra. Every picture was given an aesthetics rating which helped situation Imagen 2 to offer extra weight to photographs in its coaching dataset that align with qualities people desire. This system improves Imagen 2’s potential to generate higher-quality photographs.

AI-generated photographs utilizing the immediate “Flower”, with decrease aesthetics scores (left) to greater scores (proper).

Fluid fashion conditioning

Imagen 2’s diffusion-based methods present a excessive diploma of flexibility, making it simpler to regulate and modify the fashion of a picture. By offering reference fashion photographs together with a textual content immediate, we are able to situation Imagen 2 to generate new imagery that follows the identical fashion.

See Also

A visualization of how Imagen 2 makes it simpler to regulate the output fashion by utilizing reference photographs alongside a textual content immediate.

Superior inpainting and outpainting

Imagen 2 additionally permits picture modifying capabilities like ‘inpainting’ and ‘outpainting’. By offering a reference picture and a picture masks, customers can generate new content material immediately into the unique picture with a way referred to as inpainting, or prolong the unique picture past its borders with outpainting. This know-how is deliberate for Google Cloud’s Vertex AI within the new 12 months.

Imagen 2 can generate new content material immediately into the unique picture with inpainting.

Imagen 2 can prolong the unique picture past its borders with outpainting.

Accountable by design

To assist mitigate the potential dangers and challenges of our text-to-image generative know-how, we set sturdy guardrails in place, from design and improvement to deployment in our merchandise.

Imagen 2 is built-in with SynthID, our cutting-edge toolkit for watermarking and figuring out AI-generated content material, enabling allowlisted Google Cloud prospects so as to add an imperceptible digital watermark immediately into the pixels of the picture, with out compromising picture high quality. This permits the watermark to stay detectable by SynthID, even after making use of modifications like filters, cropping, or saving with lossy compression schemes.

Earlier than we launch capabilities to customers, we conduct sturdy security testing to attenuate the chance of hurt. From the outset, we invested in coaching knowledge security for Imagen 2, and added technical guardrails to restrict problematic outputs like violent, offensive, or sexually express content material. We apply security checks to coaching knowledge, enter prompts, and system-generated outputs at technology time. For instance, we’re making use of complete security filters to keep away from producing probably problematic content material, similar to photographs of named people. As we’re increasing the capabilities and launches of Imagen 2, we’re additionally constantly evaluating them for security.


This work was made potential by key analysis and engineering contributions from:

Aäron van den Oord, Ali Razavi, Benigno Uria, Çağlar Ünlü, Charlie Nash, Chris Wolff, Conor Durkan, David Ding, Dawid Górny, Evgeny Gladchenko, Felix Riedel, Cling Qi, Jacob Kelly, Jakob Bauer, Jeff Donahue, Junlin Zhang, Mateusz Malinowski, Mikołaj Bińkowski, Pauline Luc, Robert Riachi, Robin Strudel, Sander Dieleman, Tobenna Peter Igwe, Yaroslav Ganin, Zach Eaton-Rosen.

Due to: Ben Bariach, Daybreak Bloxwich, Ed Hirst, Elspeth White, Gemma Jennings, Jenny Brennan, Komal Singh, Luis C. Cobo, Miaosen Wang, Nick Pezzotti, Nicole Brichtova, Nidhi Vyas, Nina Anderson, Norman Casagrande, Sasha Brown, Sven Gowal, Tulsee Doshi, Will Hawkins, Yelin Kim, Zahra Ahmed for driving supply; Douglas Eck, Nando de Freitas, Oriol Vinyals, Eli Collins, Demis Hassabis for his or her recommendation.

Thanks additionally to many others who contributed throughout Google DeepMind, together with our companions in Google.

Source Link

What's Your Reaction?
In Love
Not Sure
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top