Madonna uses artificial intelligence to generate video of La Isla Bonita

To achieve that ethereal look, the pop legend used a little-explored branch of artificial intelligence Generative (AI): The text to video tool. In which you write some words, for example, sunset over a surreal cloud or waterfall in the jungle at dawn, and an instant video is made.

Following in the footsteps of AI chats and still image generators, some AI video enthusiasts say the technology Emerging technology could one day change entertainment, allowing, for example, the creation of movies with customizable stories and endings. But there is a long way to go before this can be achieved, and many ethical pitfalls along the way.

For early adopters, like Madonna, who for decades has pushed the boundaries of art, it was more of an experiment. The artist rejected an earlier version of the images that had been proposed for The beautiful island which used more conventional computer graphics to evoke a tropical atmosphere.

“We tried CGI. It looked pretty bland and cheesy and she didn’t like it,” said Sasha Kasiuha, the tour’s content director. Celebration by Madonna, which continues until the end of April. “And then we decided to try AI.”

Impact of AI

OpenAI, maker of ChatGPT, already gave a glimpse of what sophisticated text-to-video technology could look like when the company recently showed off Sora, a new tool not yet available to the public. Madonna’s team tested a different product from New York-based startup Runway, one of the technology pioneers that launched its first public text-to-video model last March. The company presented a more advanced version Gen-2 in June.

Runway CEO Cristbal Valenzuela said that while some see these tools as a “magical device where you type a word and it somehow conjures up exactly what you had in your head,” the most effective approaches are from professionals. creatives looking for an upgrade from the decades-old digital publishing software they’re already using.

He said Runway can’t make a full-length documentary yet. But it could help complete a background video, or a b-roll, that is, the secondary shots and scenes that help tell the story.

“That saves you maybe a week of work,” Valenzuela said. “The common denominator in many cases is that people use it as a way to increase or speed up something they could have done before.”

Runway’s target clients are: “large streaming, production companies, post-production companies, visual effects companies, marketing teams, advertising companies. A lot of people who make content for a living,” Valenzuela said.

The AI controversy

Dangers await. Without effective safeguards, AI video generators could threaten democracy with images deepfake (videos, images or sounds manipulated by artificial intelligence to appear authentic and real), which could convince people of something that never happened or, as is already the case with AI image generators, flood the Internet with fake pornographic scenes that represent what They appear to be real people with recognizable faces. Under pressure from regulators, major tech companies have promised to watermark AI-generated results to help identify what is real.

Copyright disputes could also arise over the collections of videos and images that AI systems are being trained on (neither Runway nor OpenAI reveal their data sources) and the extent to which they are unfairly replicating copyrighted works. And there is a fear that, at some point, video-making machines could replace human labor and art.

For now, longer AI-generated videos are still measured in seconds and can feature jerky movements and telltale glitches, such as distorted hands and fingers. Fixing that is just a matter of more data and more training, and the computing power on which that training depends, said Alexander Waibel, a computer science professor at Carnegie Mellon University who has researched AI since the 1970s. .

“Now I can say, ‘Make me a video of a rabbit dressed as Napoleon walking through New York City,'” Waibel said. “He knows what New York City is like, what a rabbit is like, what Napoleon is like.”

Which is impressive, he said, but still far from creating a compelling story.

Before launching its first-generation model last year, Runway made a name for itself in AI as a co-developer of the Stable Diffusion imager. Another company, Stability AI, based in London, has taken over the development of Stable Diffusion.

The underlying diffusion model technology behind most major AI image and video generators works by creating a map of the noise, or random data, in images, effectively destroying an original image and then predicting what a new one should look like. It borrows an idea from physics that can be used to describe, for example, how gas diffuses outward.

“What diffusion models do is reverse that process,” said Phillip Isola, an associate professor of computer science at the Massachusetts Institute of Technology. “They take the randomness and freeze it back into the volume. That’s how you go from randomness to content. And that’s how you can make random videos.”

Generate videos with the tool

Video generation is more complicated than still images because it must take into account temporal dynamics, or how elements within the video change over time and across sequences of frames, said Daniela Rus, another MIT professor who runs its Laboratory. of Computer Science and Artificial Intelligence.

Rus said the computing resources required are “significantly higher than for still image generation” because it “involves processing and generating multiple frames for every second of video.”

That doesn’t stop some wealthy tech companies from trying to keep outdoing each other in showing higher quality AI video generation for longer. Requiring written descriptions to make an image was just the beginning. Google recently demonstrated a new project called Genie that can be asked to transform a photograph or even a sketch into “an infinite variety” of explorable video game worlds.

In the short term, AI-generated videos are likely to appear in educational and marketing content, providing a cheaper alternative to producing original images or sourcing stock videos, said Aditi Singh, a researcher at Cleveland State University who has studied the text-to-video market.

When Madonna first talked to her team about AI, the “main intention wasn’t, ‘Oh, look, it’s an AI video,'” said Kasiuha, the creative director.

“I asked myself, ‘Can you use one of those AI tools to make the image sharper, to make sure it looks current and high resolution?'” Kasiuha said. “He loves it when you bring new technology and new types of visuals with you.”

Longer AI-generated movies are already being made. Runway hosts an annual AI film festival to showcase this type of work. But it remains to be seen if that is what the human audience chooses to see.

“I still believe in humans,” said Waibel, the Carnegie Mellon professor. “I still believe that it will end up being a symbiosis where some AI comes up with something and a human improves or guides it. Or the humans will do it and the AI will fix it.”

FUENTE: AP