AI image generator Stable Diffusion 2 increases the resolution and looks in depth

The AI ​​system Stable Diffusion is going fast: Stability AI has announced version 2.0 of the system, which creates images based on text prompts. Three months after the release of v1 and one month after the bumpy release of version 1.5, the model is now in a new repository. Version 2 can create images with higher resolution and evaluate and transfer the depth information in images.

In August, Stable Diffusion v1 started to compete with AI image generators such as DALL E 2 and Midjourney. In the meantime, Nvidia has also launched its own text-to-image system, eDiffi. Stable Diffusion is attractive in terms of its terms of use, because on the one hand it was published as an open source project and on the other hand the license allows commercial use of the images created.

The text-to-image models in Stable Diffusion 2.0 now use OpenCLIP as the text encoder. That open source project based on CLIP (Contrastive Language-Image Pre-training) by OpenAI. The non-profit organization LAION largely developed the encoder, and Stability AI participated in the development.

As in v1, a subset of the multimodal data set LAION-5B, which contains 5.8 billion text-image pairs, serves as the basis for training the model. An NSFW (Not Safe for Work) filter removed adult content from the dataset.

The text-to-image models create images with either a standard resolution of 512×512 or 768×768 pixels. Stable Diffusion 2.0 brings an upscaler diffusion model that quadruples the resolution of the images created.



The upscaler creates an image with 512×512 pixels from an image with 128×128 pixels.

In addition, the release an advanced inpainting model on boardto change images later, i.e. to add elements or to retouch them.



With the inpainting model, images can be changed afterwards.

This is new Depth conditional model depth2imgwhich takes the depth of the image into account to create images from an input image that give other elements the same depth properties to make them look just as three-dimensional.



With depth2img text input can be combined with the depth information of an image.

Stable Diffusion is the result of a collaboration between Stability AI, Runway ML, researchers from the Computer Vision & Learning Group (CompVis) at LMU Munich – previously in Heidelberg -, Eleuther AI and LAION (Large-scale Artificial Intelligence Open Network). Robin Rombach from CompVis and Patrick Esser from Runway ML led v1 of the project.

also read

Upon release, Stable Diffusion got off to a flying start, earning 33,600 stars on GitHub in 90 days. The project not only quickly found users, but also soon found investors. Two months after the release, Stability AI was able to look forward to a good 100 million US dollars in investor money.

Shortly thereafter, there was back and forth regarding the release of version 1.5, which first appeared, then disappeared, only to remain available after all. after that v1 repository on GitHub under CompVis lay, had Runway ML for version 1.5 a new repository created. Stable Diffusion 2.0 is now back in a fresh repository under the umbrella of Stability AI.

More details on the current version leave the stable AI blog as well as the News section of the readme in the current repository. The repository also contains examples of prompts and adjustments via the models for upscaling and depth information. Like its predecessor, Stable Diffusion 2.0 is optimized to run on a single GPU. The declared aim is for as many people as possible to be able to use the software.


(rm)

To home page

Leave a Comment