The Federal Trade Commission (FTC) is promising a $25,000 reward for the best solution to combat the growing threat of AI voice cloning.
Sometimes referred to as audio deepfakes, the number of online services offering easy-to-use voice cloning facilities has proliferated since generative AI went mainstream, raising concerns over its potential for abuse in cyberattacks.
For example, a widespread threat could involve spoofing CEOs’ voices to impersonate them and instruct the finance department to wire money to an attacker’s account. Friends and families could also be tricked into sending money to loved ones, and performing artists could have their livelihoods threatened should the technology continue to mature.
Interested parties have until January 12 to submit their ideas to help address AI-based voice cloning fraud, focusing mainly on the prevention, monitoring, and evaluation of the technology.
“This effort may help push forward ideas to mitigate risks upstream – shielding consumers, creative professionals, and small businesses against the harms of voice cloning before the harm reaches a consumer,” the FTC said.
“It also may help advance ideas to mitigate risks at the consumer level. And if viable ideas do not emerge, this will send a critical and early warning to policymakers that they should consider stricter limits on the use of this technology, given the challenge in preventing harmful development of applications in the marketplace.”
Submissions will be assessed based on how feasible it would be to execute and administer, how resilient it is to technological changes, and how thoughtfully they consider how liability and responsibility would be placed on companies, among other measures.
The top prize on offer is $25,000, which doesn’t sound like a goldmine considering the potential broad applications an ingenious solution may have.
One runner-up will also receive $4,000, three honorable mentions will be paid $2,000 each for their troubles, and organizations of 10 or more people will receive a cashless recognition award.
AI voice abuse in action
The effectiveness of AI voice cloning has been proven in repeat cases over the past year. Experts at Slovakian security shop ESET showed how the aforementioned example of spoofed CEOs can be pulled off.
In fact, it has been an issue for years before generative AI came into the hands of the average person. A UK energy company was drained of $243,000 in 2019 after its CEO was instructed to send a large sum to a Hungarian supplier.
An New York Times report, again from the previous 12 months, detailed a range of cases successfully targeting the finance sector, tricking banks into moving money from who they believed to be genuine clients. El Reg has also reported on similar attacks taking place in the UAE as far back as 2021.
Romance scams are rife, too, with one Brit falling for a spoofed Kevin Costner, and criminals have also shown they aren’t above carrying out “family emergency” scams – ones that target parents with the cloned voices of their children asking for money to post bail, for example.
More sinister examples have seen mothers take calls purportedly from their daughters held hostage by “kidnappers” who demand huge ransoms.
Voice cloning is made possible by feeding an AI model enough training data to understand the sound, tone, pacing, inflection, and other nuances of an individual’s voice. Celebrities and other public figures are thought to be at acute risk of these attacks given the number of recordings of their voices that exist online.
With the rise of social media and video content creation culture, many non-celebrities and even children also have enough material online to train a model effectively.
Kaspersky researchers looked into the rise in AI voice cloning last year and discovered a wide variety of freely available, open source tools that could generate cloned voices using AI tech. However, they contended that to get a convincing clone up and running, it required a bit of Python know-how and some tinkering on the cloner’s part.
However, the paid offerings tend to be much more effective. The researchers pointed to Microsoft’s VALL-E model that could supposedly generate a decent clone with just three seconds of voice audio used as training data.
There are also other paid solutions out there that work better than the free ones, but these are still only in the early development stage, so we can expect the accuracy and effectiveness of these models to improve over time. ®