via Microsoft
What you need to know
- Microsoft recently released an AI tool called VALL-E that can create convincing replications of people’s voices.
- The tool uses just a 3-second recording as a prompt to generate content.
- VALL-E can replicate the emotions of a speaker, differentiating it from several AI models.
Microsoft recently released an artificial intelligence tool known as VALL-E that can replicate people’s voices (via AITopics). The tool was trained on 60,000 hours of English speech data and uses 3-second clips of specific voices to generate content. Unlike many AI tools, VALL-E can replicate the emotions and tone of a speaker, even when creating a recording of words that the original speaker never said.
A paper out of Cornell University used VALL-E to synthesize several voices. Some examples of the work are available on GitHub.
The voice samples shared by Microsoft range in quality. While some of them sound natural, others are clearly machine-generated and sound robotic. Of course, AI tends to get better over time, so in the future generated recordings will likely be more convincing. Additionally, VALL-E only uses 3-second recordings as a prompt. If the technology was used with a larger sample set, it could undoubtedly create more realistic samples.
At the moment, VALL-E is not generally available, which may be a good thing as AI-generated replications of people’s voices could be used in dangerous ways by threat actors and others with malicious intent.
Windows Central take: Impressive but scary
While VALL-E is undoubtedly impressive, it raises several ethical concerns. As artificial intelligence becomes more powerful, the voices generated by VALL-E and similar technologies will become more convincing. That would open the door to realistic spam calls replicating the voices of real people that a potential victim knows.
Politicians and other public figures could also be impersonated. With the speed social media travels and the polarity of political discussions, it’s unlikely that many would stop to ask if a scandalous recording were genuine, as long as it sounded at least somewhat authentic.
Security concerns also come to mind. My bank uses my voice as a password when I call. There are measures in place to detect voice recordings and I’d assume the technology could sense if a VALL-E voice was used. That beings said, it still makes me uneasy. There’s a good chance that the arms race will escalate between AI-generated content and AI-detecting software.
While not a security concern, some have brought up the fact that voice actors may lose work to VALL-E and competing tech. While it’s unfortunate to see people lose work, I don’t see a way around this. If VALL-E reaches a point where it can replace voice actors for audio books or other content, companies are going to use it. That’s just the reality of technology advancing. In fact, Apple recently announced a feature that uses AI to read audio books.
Like any technology, VALL-E will be used for good, evil, and everything in between. Microsoft has an ethics statement on the use of VALL-E, but the future of its usage is still murky. Microsoft President Brad Smith has discussed regulating AI in the past (via GeekWire). We’ll have to see what measures Microsoft puts in place to regulate the use of VALL-E.
Original Article: Microsoft’s VALL-E can imitate any voice with just a three-second sample
More from: Microsoft Research
The Latest Updates from Bing News
Go deeper with Bing News on:
VALL-E
- L’escola La Farga, l’Associació Educativa Vall del Terri, Anna Juàrez i Josep Callís reben els Premis Mestres 68
La Sala de Graus de la Universitat de Girona ha acollit aquesta tarda l'acte de lliurament de la 26a edició dels Premis Mestres 68. Aquests reconeixements, que impulsen el Moviment de Renovació Pedagò ...
- Phishing scams playbook: Adapting to keep up with malicious AI
Originally, phishing attacks were relatively simplistic, with fraudsters impersonating legitimate entities via email to deceive individuals into disclosing sensitive information, like passwords and ...
- Enllestides les semifinals de la Lliga Autonòmica de raspall
Gavarda A-Xeraco i Bicorp A-Moixent, en Primera masculina, i La Vall-Meliana B i Beniparrell A-Moixent A, en femenina, són els encreuaments de les màximes categories del campionat ...
- Els empresaris de la Vall d'Albaida demanen a Iberdrola una nova subestació elèctrica per a potenciar la indústria
També s'ha demanat a l'elèctrica que “millore en tot el possible l'agilitació dels tràmits que són de la seua competència”, perquè no es produïsca una desincentivació de les inversions ...
- Mint Primer: Alexa, why is voice failing to resonate in tech?
The Humane AI Pin, a voice-controlled personal assistant, was one of Silicon Valley’s most-hyped AI-first products. However, the hype has crashed rather fast. Barring Amazon’s Alexa and Apple’s Siri, ...
Go deeper with Bing News on:
AI-generated content
- TikTok begins labeling AI-generated content
TikTok announced that it is now adding labels to content on its platform that was generated using artificial intelligence.
- TikTok Becomes First Social Network to Auto Label AI-Generated Content
TikTok this week announced that it has started automatically labeling AI-generated content created on third-party platforms, preventing AI images ...
- TikTok to Label AI-Generated Content Automatically
TikTok will automatically label AI-generated content with the help of the Coalition for Content Provenance and Authenticity (C2PA).
- TikTok to label AI-generated content (but not audio yet)
TikTok takes steps towards transparency in AI-generated content, using “Content Credentials, a new technology from the C2PA.
- TikTok to label AI-generated content from OpenAI and elsewhere
TikTok plans to start labelling images and video uploaded to its video-sharing service that have been generated using artificial intelligence, it said on Thursday, using a digital watermark known as ...