Site icon California18

This site tells you if your photos have been used to train AI

Image-creating artificial intelligences, such as Stable Diffusion or Midjourney, have become very popular in a very short time. But it remains difficult to know exactly on which images these AIs trained.

Have your photos on social networks been used to train an artificial intelligence? Said like that, the question may seem absurd. Who would imagine that photos posted on Facebook or Instagram could be used to teach AIs what a forest looks like?

Yet it’s a fact: the image-generating artificial intelligences have been trained on a gigantic corpus of photos found on the Internet — perhaps yours. The question is even more important if you are a creator on social networks, and if you want to make sure that there has not been a breach of copyright. To find out if this is the case, there is a tool: HaveIBeenTrained.

See the databases used to train AIs

HaveIBeenTrained makes it possible to consult Laion 400M and Laion 5B, two gigantic databases containing respectively 400 million and 5 billion photos which were used to train the artificial intelligences Stable Diffusion and Imagen. These are the two largest databases of images described with text, allowing AIs to better associate the two ideas.

To find out if one of your drawings shared on the Internet is part of these two huge databases, nothing could be simpler: just do a search by image, or by text. A query for ” forest picture will show you all the images that exist in the database that match that description.

An example of research on HaveIBeenTrained // Source: HaveIBeenTrained

But HaveIBeenTrained is mainly aimed at artists present on social networks, and whose works could have been sucked up by Laion. The site thus offers artists to search these databases for links to their work and request their removal “, can we read in the description. “ We are in partnership with Laion, who assembled these databases, to ensure that future models [d’intelligence artificielle] are not trained with works that have been removed. »

The fact that the site is specifically aimed at artists is not insignificant. At the beginning of January 2023, three artists, including the designer Sarah Andersonwell known for her comics on Instagram, filed a complaint against Midjourney and Stable Diffusion. These artificial intelligences, using billions of images taken from the Internet to train, “ infringed the copyright of millions of artists […] who have not given their consent and who have not received compensation. »

By using HaveIBeenTrained, it is indeed easy to realize that the drawings of Sarah Andersen appear in the databases of Laion.

What do we find in these databases?

Until now, it was very difficult to know exactly what is in these huge databases of 5 billion entries. Laion 400-M and Laion 5-B were assembled with fully automated complex procedures, which do not necessarily make it possible to sort the images that will be integrated into it. And that sometimes means that some photos are not necessarily free of rights.

The Getty Image photo agency recently paid the price: it realized that AI had been trained on a large number of its photos, to the point where they could reproduce the famous copyright banner. Getty Image has filed a complaint against Stable Diffusion for having “cillegally opiated and analyzed millions of copyrighted photos. »

A quick test makes it possible to realize the variety of what can be found there. There are not only photos of landscapes, but also book covers, advertising images, but also excerpts from Facebook publications where the names are clearly identifiable, or even photos of anonymous people published on Skyblog.

During our research, we even stumbled across some pornographic photos, proof that there’s lots and lots of stuff available on these databases — and that everyone would do well to check out what’s out there. find there.

Exit mobile version