Nvidia has updated its GauGAN tool, which is now able to generate realistic photos by typing text. All against a backdrop of machine learning and artificial intelligence.
There is no longer even need to doodle anything to generate very realistic landscapes thanks to artificial intelligence: now, a few words are enough to produce natural views, such as a shore, mountains or even mountains and valleys. . This performance is made possible thanks to new advances in AI at Nvidia, with its GauGAN tool.
GauGAN? It’s a name that is obviously a nod to the Post-Impressionist painter Paul Gauguin. But it is above all a way of recalling the operation of its tool, because GAN is the acronym of generative antagonist network, generative adversarial network in English (GAN). It is an unsupervised learning method designed by computer scientist Ian Goodfellow.
The idea is to call on two GANs to cooperate in order to achieve a certain result. The first generates the visuals, while the second, called the “discriminator”, is responsible for evaluating them. The “discriminator” has been trained in deep learning – a technique that involves feeding the AI with prior data. He therefore “knows” what the visuals should look like.
Type some text, get a picture
It is on this basis that Nvidia has iterated, so that it can integrate text support. This is what the American company is developing in a news item published on November 22, mentioning GauGAN 2. With this tool, which Internet users can test on a dedicated site, it is possible to generate a landscape by describing it with words and, if necessary, to complete it with scribbles.
« With the versatility of text prompts and sketches, GauGAN2 allows users to create and customize scenes faster and with finer control. », Welcomes Nvidia, who notes that his demo is ” one of the first to combine several modalities – text, semantic segmentation, sketch and style – in a single GAN framework. »
The demonstration video is obviously very spectacular: as you type and arrange the words between them, the photorealistic visual changes to transcribe the request. In fact, the tool, when tested, does not work in real time: you have to click on a button, once its sentence has been entered (in English, but the site also seems to understand French), to see the result.
« The GauGAN 2 AI model was trained on 10 million high-quality landscape images using the Nvidia Selene Supercomputer, an Nvidia DGX SuperPOD system that is among the top 10 most powerful supercomputers in the world », Points out Nvidia. The site specifies that the neural network has also learned the connection between words and the images to which they correspond, such as “winter”, “foggy” or “rainbow”.
If one leaves the landscapes, GauGAN 2 appears lost and its interpretation of a written text becomes random – but this can give, for once, sometimes fantastic or dreamlike visuals. We wanted to make him draw a sheep, but the network does not seem to know what it is. However, it would be enough to train him by showing one of the two GANs millions of pictures of sheep.
Nvidia’s work in the field of artificial intelligence has already led it to create something other than pretty landscapes. The company carried out demonstrations on particularly realistic faces, but which do not exist. Nvidia even virtually cloned its CEO during a conference in August 2021, mobilizing significant technical resources.
This very spectacular work opens up prospects that are both exciting and disturbing. The solutions outlined by Nvidia with GauGAN could have obvious outlets in video games, cinema, animation or series, in association with the work of designers. But we can also imagine unpleasant uses, whether for disinformation or with deep fakes.