The AI system Stable Diffusion is going fast: Stability AI has announced version 2.0 of the system, which creates images based on text prompts. Three months after the release of v1 and one month after the bumpy release of version 1.5, the model is now in a new repository. Version 2 can create images with higher resolution and evaluate and transfer the depth information in images.
The text-to-image models in Stable Diffusion 2.0 now use OpenCLIP as the text encoder. That open source project based on CLIP (Contrastive Language-Image Pre-training) by OpenAI. The non-profit organization LAION largely developed the encoder, and Stability AI participated in the development.
As in v1, a subset of the multimodal data set LAION-5B, which contains 5.8 billion text-image pairs, serves as the basis for training the model. An NSFW (Not Safe for Work) filter removed adult content from the dataset.
The text-to-image models create images with either a standard resolution of 512×512 or 768×768 pixels. Stable Diffusion 2.0 brings an upscaler diffusion model that quadruples the resolution of the images created.
In addition, the release an advanced inpainting model on boardto change images later, i.e. to add elements or to retouch them.
Look into the third dimension
This is new Depth conditional model depth2imgwhich takes the depth of the image into account to create images from an input image that give other elements the same depth properties to make them look just as three-dimensional.
Rapid start and initial confusion
Stable Diffusion is the result of a collaboration between Stability AI, Runway ML, researchers from the Computer Vision & Learning Group (CompVis) at LMU Munich – previously in Heidelberg -, Eleuther AI and LAION (Large-scale Artificial Intelligence Open Network). Robin Rombach from CompVis and Patrick Esser from Runway ML led v1 of the project.
Upon release, Stable Diffusion got off to a flying start, earning 33,600 stars on GitHub in 90 days. The project not only quickly found users, but also soon found investors. Two months after the release, Stability AI was able to look forward to a good 100 million US dollars in investor money.
Two major versions and three repositories
Shortly thereafter, there was back and forth regarding the release of version 1.5, which first appeared, then disappeared, only to remain available after all. after that v1 repository on GitHub under CompVis lay, had Runway ML for version 1.5 a new repository created. Stable Diffusion 2.0 is now back in a fresh repository under the umbrella of Stability AI.
More details on the current version leave the stable AI blog as well as the News section of the readme in the current repository. The repository also contains examples of prompts and adjustments via the models for upscaling and depth information. Like its predecessor, Stable Diffusion 2.0 is optimized to run on a single GPU. The declared aim is for as many people as possible to be able to use the software.