Latest

Microsoft’s AI Voice Cloning Tech Is So Good, You Can’t Use It

Microsoft’s research team has introduced VALL-E 2, an AI system that can clone voices with remarkable accuracy. Using just a few seconds of audio, this technology can generate voices that are indistinguishable from the source and achieve human-level performance. VALL-E 2 utilizes a unique method called “Repetition Aware Sampling” along with adaptive switching between sampling techniques to improve consistency and overcome common issues in generative voice technology. It has the potential to assist individuals who have lost their ability to speak by generating speech for them. Despite its impressive capabilities, Microsoft has stated that VALL-E 2 will not be made available to the public. The company cited concerns about the risks of voice imitation, scams, and criminal activities associated with convincingly realistic AI voices. The research team also highlighted the need for a standardized method to mark AI-generated content and ensure consent from the voice owner. In tests, VALL-E 2 outperformed human benchmarks in terms of robustness, naturalness, and similarity of generated speech. Other AI companies, such as Meta and OpenAI, have also showcased cutting-edge voice cloning models without releasing them due to similar concerns and the need for ethical guidelines in the AI community.