The Story of AI Graphics at JetBrains
Design decisions
DesktopArt
JetBrains
PyCharm
At JetBrains, we’re consistently refining our method to creating items of artwork to be used as web site parts and launch graphics. Our mission is to free graphic designers from routine duties to allow them to think about their core competence – creativity. The historical past of inside instruments for producing artwork at JetBrains begins a few decade in the past. At first, we primarily used WebGL-based instruments, which generated every part randomly within the browser on the fly (the interactive archive is accessible here). The pictures beneath had been created with this method.
Splash screens that had been created utilizing WebGL.
In 2020, we launched our first tool based on deep neural networks. Since then, every part has been generated in a K8s GPU cluster utilizing PyCharm and Datalore for native and distant improvement. The browser is used just for input-output. With this method based mostly on neural networks, we’ve achieved a a lot greater diploma of personalization, permitting us to cater to our designers’ wants, and we’re consistently working to enhance it.
These footage had been made with a compositional pattern-producing community (CPPN, prime) and Steady Diffusion (SD, backside). This submit will cowl the technical particulars of each approaches, in addition to the way in which we mix them to create much more spectacular designs.
Splash screens that had been generated with neural networks.
CPPNs are among the many easiest generative networks. They merely map pixel coordinates (x, y) to picture colours (r, g, b). CPPNs are often skilled on particular pictures or units of pictures. Nevertheless, we discovered that randomly initialized CPPNs produce stunning summary patterns when the initialization is finished accurately).
CPPN structure: pixel coordinates are inputs, RGB values are outputs.
Utilizing the utilization information from an early inside model of the generator, we refined our algorithms to enhance the visible high quality. Except for that, we additionally barely prolonged the classical structure of CPPNs by introducing a number of digital parameters. Therefore, our CPPNs now map (x, y, a, b, c, f) to (r, g, b). This straightforward change permits us to introduce an easy-to-use, though considerably unpredictable, methodology for altering the picture, as proven beneath.
By updating the digital parameter (a), we’re barely altering the image.
These digital parameters don’t need to be fixed. For instance, we will map the worth of the digital parameter f of every pixel to the space from this pixel to the middle of the picture. This trick permits us to make sure the picture has round shapes. Or we might map f to the sum of absolutely the values of the pixel’s coordinates, which can yield diamond-shaped patterns. That is the place math really meets artwork!
Totally different features f(x,y) end in totally different picture patterns.
To make sure that our randomly initialized CPPNs at all times produce stunning designs, we skilled a advice system to foretell whether or not the given set of parameters will end in a picture that appears good. We skilled our algorithm from consumer suggestions obtained throughout inside testing. The determine beneath exhibits two examples of pictures created by randomly initialized CPPNs and their corresponding “beautifulness” scores.
Predicting “beautifulness” scores of CPPN pictures.
Our CPPN-generated artistic endeavors actually come to life when they’re reworked into video graphics. By mapping digital parameters (a, b, c) over any closed parametric curve (one which begins and ends on the similar level), we will create seamlessly looped animations of any desired size!
Pattern frames of a CPPN animation video.
The selection of a curve perform is essential. Animating digital parameters over a plain circle is probably the most simple method. Nevertheless, it has a disadvantage: when the signal of a parameter modifications (for instance, from 0.01 to -0.01) whereas it has a low first spinoff worth (one which equals zero within the case of a circle trajectory), the result’s often a shaky animation. To account for this difficulty, we use Bernoulli’s lemniscate to make sure that the indicators of the digital parameters by no means change (see the picture beneath). This solves the shaky animation downside, however introduces a brand new one. For many animation frames, one of many parameters is simply incrementally up to date, making the animation look too shallow. We addressed this by switching to a random spline perform. The extra advanced the trajectories we used, the richer the animation regarded!
Examples of CPPN curve features.
There’s yet one more essential element: coloration correction. Our CPPNs – and subsequently the ensuing pictures – are randomly generated, however we have to be certain that every makes use of our model colours. We tried a number of totally different approaches to realize this. The primary iteration (used in the 2020 releases) relied on SVG recoloring instantly within the browser (utilizing feColorMatrix and feComponentTransfer). This method was fast – because the recoloring occurred within the browser, we might replace the palette with out re-rendering the picture on the server facet. Nevertheless, it was tough to implement as some palettes are too advanced for feColorMatrix and feComponentTransfer and it was usually unreliable. After intensive experimentation, we discovered that the ensuing colours might differ relying on the browser and the working system. Right here is an instance from our experiments in early 2020. On the left is a screenshot of a background of the sooner generator model made on a setup utilizing Safari on macOS, and on the correct is a screenshot of the identical background however from a setup utilizing Google Chrome on Ubuntu Linux. Discover the refined brightness discrepancies. The extra post-processing results we utilized, the extra distinguished they grew to become.
An instance of brightness discrepancies.
One other instance is MDN’s sample of feComponentTransfer. This time, each pictures had been made on the identical machine utilizing Ubuntu Linux and Google Chrome, however within the prime screenshot, {hardware} acceleration was disabled. There are distinguished coloration discrepancies, particularly between the Desk lookup examples. Thus, regardless of being very fast, this method to paint correction was extraordinarily inconsistent.
An instance of coloration discrepancies.
Our present method (in use since 2021) is extra simple. We render supply pictures in 32-bit grayscale, which means that as an alternative of RGB, our CPPNs return solely a single luminance worth. We then map every pixel to a lookup desk with precomputed ideally suited RGB values. This method is slower however produces pixel-perfect outcomes.
An instance of coloration correction utilizing a grayscale picture.
2020.1 splash screens that used SVG recoloring.
When our present method to paint correction is used alongside the CPPN with digital parameters and spline animation, the result’s a video like this!
One other exceptional property of CPPNs is that, on account of their easy structure, it’s very straightforward to translate their computational graphs to GLSL code. As soon as the animation video is prepared, we will export it as a WebGL fragment shader after which instantly run it within the browser. An instance of the outcomes of this method is Qodana’s landing page.
Our CPPN-based generator is accessible here.
To dive deeper into CPPNs, take a look at our public Datalore pocket book with code examples:
Steady Diffusion gives a excessive stage of versatility and visible constancy, making it an ideal spine for our artwork mills. To make Steady Diffusion acceptable to be used as a supply of launch graphics, we needed to adhere to the next standards:
- Photographs ought to comply with the model palette.
- No artifacts or glitches (reminiscent of damaged pixels) are allowed.
- It ought to be straightforward to make use of a selected type (summary clean strains) out of the field.
- It ought to require little to no prompting, which means it ought to present accessible and intuitive controls.
Although there may be at all times room for enchancment, we’ve met all of those necessities. The most recent pictures are publicly available, and all the technical particulars are beneath.
2023.1 splash screens created with Steady Diffusion.
To supply outcomes that constantly met all of our standards, we fine-tuned Steady Diffusion utilizing varied references supplied by our designers. Beneath are some examples of pictures generated in keeping with varied types.
Experimental types obtained by fine-tuning Steady Diffusion.
Earlier than diving into the technical particulars of the fine-tuning course of, let’s have a look at the internals of Steady Diffusion. It basically consists of three elements: the CLIP textual content encoder (a tiny transformer mannequin used for encoding textual content right into a multi-modal embedding area), a variational autoencoder that compresses and decompresses pictures to and from latent area, and the denoising UNet.
The structure of Steady Diffusion. Picture supply: www.philschmid.de/stable-diffusion-inference-endpoints.
The technology course of is roughly as follows:
- We encode the immediate textual content into an embedding, which is a 77×768 floating-point array.
- We randomly generate the latent illustration of the picture, which may very well be both pure Gaussian noise or a noised illustration of an init picture.
- We repeatedly cross the encoded latent picture and encoded textual content via the denoising UNet for a given variety of steps.
- After denoising the latent picture, we cross it via the decoder, thus decompressing it into a normal RGB picture.
The denoising course of. Picture supply: jalammar.github.io/illustrated-stable-diffusion/.
Crucially for us, the beauty of Steady Diffusion is that it’s doable to fine-tune it with little or no information and obtain nice outcomes! As a facet impact, data-efficient fine-tuning strategies are additionally computing-efficient, which makes it even higher.
Probably the most simple fine-tuning method is textual inversion (p-tuning). We freeze all the weights, reminiscent of UNet, VAE, and the textual content encoder (which means we don’t replace them throughout coaching), and solely prepare one new phrase per embedding for the textual content encoder. As a result of we solely prepare one new phrase per embedding, there are solely 768 trainable parameters!
Define of the text-embedding and inversion course of. Picture supply: textual-inversion.github.io/.
These customized embeddings are composable, which means we might use as much as 77 embeddings in a single immediate. On prime of that, they’re straightforward to coach, taking ~2 hours on a single RTX 4090. Beneath is an instance of the coaching course of. Each of those pictures had been generated utilizing the immediate “digital artwork within the type of <sculpture>”, the place “<sculpture>” is the brand new phrase embedding that we’re coaching. As we carry out extra coaching steps, the picture evolves, and the brand new visible type turns into an increasing number of pronounced.
The picture generated with the textual inversion after 500 and 3000 coaching steps.
One other common and environment friendly fine-tuning methodology is Low-Rank Adaptation, or just LoRA. The important thing thought of LoRA is much like textual inversion, solely this time along with freezing the weights we additionally introduce new ones by including small adapter layers to consideration layers inside UNet.
Illustration of the LoRA methodology inside one Transformer layer. Picture supply: adapterhub.ml/blog/2022/09/updates-in-adapter-transformers-v3-1/.
In comparison with textual inversion, this method makes it doable to seize extra refined patterns from the fine-tuning information (for instance, “AI portrait” apps work by coaching adapter layers on the consumer’s face), however it makes use of barely extra sources and, most significantly, a number of LoRAs can’t be composed. In our particular use case, we discovered that LoRA is only when working with Steady Diffusion XL. Against this, in earlier variations of Steady Diffusion (1.4, 1.5, or 2.1), textual inversion permits for extra versatility.
The picture generated with LoRA after 200 and 1000 coaching steps.
One among our standards for utilizing Steady Diffusion was the necessity to make sure that the generated pictures comply with the colour palette of some explicit model, and that is the place CPPNs come to our help! Earlier than producing a picture with Steady Diffusion, we generate a picture with CPPN utilizing our Gradient generator (described above), apply the specified colours with pixel-perfect accuracy, then encode it with VAE and blend it with Gaussian noise. UNet makes use of the ensuing latent picture as its place to begin, thus preserving the unique colours and composition.
CPPN → Steady Diffusion pipeline.
As soon as the CPPN picture is prepared, we will additionally edit it instantly within the browser to realize any form and design we might ever think about!
CPPN → Steady Diffusion pipeline with manually edited CPPN picture.
Lastly, as soon as we now have produced a number of pictures with our “CPPN → Steady Diffusion” pipeline, we will prepare one other CPPN on these pictures and switch them into an animation, as described within the CPPNs: Animation part above! Right here’s some instance GLSL code, together with some instance movies:
https://www.youtube.com/watch?v=videoseries
The exploration and implementation of AI-powered graphics at JetBrains has been an journey. Our instruments have advanced and matured over time, from our preliminary method utilizing WebGL-based random technology to our present use of CPPNs and Steady Diffusion to generate modern and customized designs. Shifting ahead, we anticipate larger ranges of customization and flexibility, and we’re excited concerning the prospects these applied sciences will unlock within the graphics technology subject.
We hope this in-depth look into our AI artwork journey has been illuminating! We invite you to discover the examples we’ve supplied (together with our interactive archive) and share your suggestions right here within the feedback or by way of cai@jetbrains.com. Please tell us what sorts of matters you want to see from the Computational Arts crew sooner or later!