Artificial intelligence can replicate any voice, including the emotions and tone of a speaker with just 3 seconds of training

via Microsoft

What you need to know

Microsoft recently released an AI tool called VALL-E that can create convincing replications of people’s voices.
The tool uses just a 3-second recording as a prompt to generate content.
VALL-E can replicate the emotions of a speaker, differentiating it from several AI models.

Microsoft recently released an artificial intelligence tool known as VALL-E that can replicate people’s voices (via AITopics). The tool was trained on 60,000 hours of English speech data and uses 3-second clips of specific voices to generate content. Unlike many AI tools, VALL-E can replicate the emotions and tone of a speaker, even when creating a recording of words that the original speaker never said.

A paper out of Cornell University used VALL-E to synthesize several voices. Some examples of the work are available on GitHub.

The voice samples shared by Microsoft range in quality. While some of them sound natural, others are clearly machine-generated and sound robotic. Of course, AI tends to get better over time, so in the future generated recordings will likely be more convincing. Additionally, VALL-E only uses 3-second recordings as a prompt. If the technology was used with a larger sample set, it could undoubtedly create more realistic samples.

At the moment, VALL-E is not generally available, which may be a good thing as AI-generated replications of people’s voices could be used in dangerous ways by threat actors and others with malicious intent.

Windows Central take: Impressive but scary

While VALL-E is undoubtedly impressive, it raises several ethical concerns. As artificial intelligence becomes more powerful, the voices generated by VALL-E and similar technologies will become more convincing. That would open the door to realistic spam calls replicating the voices of real people that a potential victim knows.

Politicians and other public figures could also be impersonated. With the speed social media travels and the polarity of political discussions, it’s unlikely that many would stop to ask if a scandalous recording were genuine, as long as it sounded at least somewhat authentic.

Security concerns also come to mind. My bank uses my voice as a password when I call. There are measures in place to detect voice recordings and I’d assume the technology could sense if a VALL-E voice was used. That beings said, it still makes me uneasy. There’s a good chance that the arms race will escalate between AI-generated content and AI-detecting software.

While not a security concern, some have brought up the fact that voice actors may lose work to VALL-E and competing tech. While it’s unfortunate to see people lose work, I don’t see a way around this. If VALL-E reaches a point where it can replace voice actors for audio books or other content, companies are going to use it. That’s just the reality of technology advancing. In fact, Apple recently announced a feature that uses AI to read audio books.

Like any technology, VALL-E will be used for good, evil, and everything in between. Microsoft has an ethics statement on the use of VALL-E, but the future of its usage is still murky. Microsoft President Brad Smith has discussed regulating AI in the past (via GeekWire). We’ll have to see what measures Microsoft puts in place to regulate the use of VALL-E.

Combining AI and nanoparticle printing for cancer cell analysis will enable low cost diagnostics in developing countries

Original Article: Microsoft’s VALL-E can imitate any voice with just a three-second sample

More from: Microsoft Research

Go deeper with Bing News on:

VALL-E

La Trobada d'Escoles en Valencià defensa a Pego l'educació plurilingüe
La comunitat educativa demana la retirada de la proposta de llei educativa del govern valencià. Hi han participat més de 15.000 persones, entre centres, docents, alumnat, famílies i activistes.
Aprovació definitiva del projecte per instal·lar escales mecàniques al passatge de la Mulassa
L'Ajuntament de Barcelona ha aprovat el projecte definitiu per implantar unes escales mecàniques al passatge de la Mulassa, al barri del Carmel.
La capital i Escaldes impulsaran una línia de bus interparroquial
Andorra la Vella i Escaldes-Engordany posaran en marxa una nova línia d’autobús que connecti les dues parròquies. La proposa forma part del ...
Era Lucana: l'aranès s'escolta per primer cop a televisió a TVE
Al primer capítol d'Era Lucana coneixem Montse Cuny, que amb Olga Besolí i Paco Boya, presenten el programa. Es tracta d'una introducció per posar-se al dia de les dades bàsiques, com ara la geografia ...
Pilar Bernabe anuncia en Ràdio Ontinyent que les obres de la línia de tren s'iniciaran de forma immediata.
La delegada del Govern a la Comunitat Valenciana s'ha mostrat molt contenta per l'adjudicació d'uns treballs que són una reivindicació històrica. | Cadena SER ...

Go deeper with Bing News on:

AI-generated content

Angels vs. Twins TV Channel and Live Stream Info for April 27
The Minnesota Twins (12-13) enter a matchup against the Los Angeles Angels (10-16) on Saturday following a five-game winning streak. The game is set to begin at 9:38 PM ET at Angel Stadium of Anaheim.
Georgia Candidates Embrace AI for Campaign Content Amid Concerns and Hopes for the Future
Georgia political campaigns incorporate AI for content, facing challenges like impersonality and data privacy concerns.
From shrimp Jesus to fake self-portraits, AI-generated images have become the latest form of social media spam
If you’ve spent time on Facebook over the past six months, you may have noticed photorealistic images that are too good to be true ...
Fact Check: Images of astronauts on moon-themed movie set generated by AI
Information you can trust Reuters, the news and media division of Thomson Reuters, is the world’s largest multimedia news provider, reaching billions of people worldwide every day. Reuters provides ...
Creators of Sora-powered short explain AI-generated video’s strengths and limitations
OpenAI's video generation tool Sora took the AI community by surprise in February with fluid, realistic video that seems miles ahead of competitors. But ...

What's Your Reaction?

Don't Like it!

I Like it!

Artificial intelligence can replicate any voice, including the emotions and tone of a speaker with just 3 seconds of training

What you need to know

Windows Central take: Impressive but scary

The Latest Updates from Bing News

Go deeper with Bing News on:

VALL-E

Go deeper with Bing News on:

AI-generated content

Leave a Reply