Microsoft unveils an artificial intelligence reproducing the voice of anyone from a short sample. All while guaranteeing the same tone as the imitated person.

After the image (in particular with Dall-E) and the text (with ChatGPT), the voice seems to be the new playground of artificial intelligence. Microsoft unveils VALL-E, a tool capable of reproducing everyone’s voice from a sample of just three seconds. The promise of the software is to be as faithful as possible in its imitation.

For this, Microsoft has fed its artificial intelligence with 60,000 hours of data spoken in English, explains the American site. Ars-Technica. The great strength of VALL-E is to be able to transcribe a person’s tone and emotion. It is thus possible to obtain an embodied reading even though the spoken words do not appear in the original sample.

Dangerous uses?

Of course, the generated voice will be all the more realistic as the starting sample is long. Three-second audio files are the limit from which an imitation can be produced. But more faithful results can be obtained by giving more material to VALL-E.

Like all content generated by artificial intelligence, this technology opens the way to usurpation. Political figures or celebrities could see non-consented messages (dubbed deep fakes) expressed from a sample of their voice.

VALL-E also poses serious security questions. As the site says Windows Centralsome services (such as banks) use the voice of their users as a password.

Finally, it is especially artistic activities that could suffer the most. From a single sample, VALL-E would be able to manage tasks currently reserved for humans. In particular dubbing of films or series.

California18

Welcome to California18, your number one source for Breaking News from the World. We’re dedicated to giving you the very best of News.

Leave a Reply