From Bugs Bunny to Flo from Progressive, Microsoft's AI tech is being used to create digital voices

Microsoft logo
Microsoft logo (Image credit: Daniel Rubino / Windows Central)

What you need to know

  • Microsoft Azure AI technology is being used to create realistic voices for chatbots and digital experiences.
  • The tech uses recordings of real voices and deep learning to create realistic digital voices.
  • Microsoft discusses the importance of using the technology responsibly in its blog post.

Many digital voices sound robotic and janky. Microsoft is trying to make this a thing of the past with neural text-to-speech technology. The technology uses recorded phrases and deep learning to create realistic digital voices.

Xuedong Huang, a Microsoft technical fellow and the chief technology officer of Azure AI Cognitive Services explains how the process works:

The real technology breakthrough is the efficient use of deep learning to process the text to make sure the prosody and pronunciation is accurate. The prosody is what the tone and duration of each phoneme should be. We combine those in a seamless way so they can reproduce the voice that sounds like the original person.

If all of this sounds a bit familiar, you may have seen coverage of Microsoft's patent for similar technology. The patent made the news because it the technology described within it could be used to create chatbots of dead people.

Microsoft is aware of the fact that technology like this could be used in creepy or dishonest ways, and it talks about transparency in its blog post. Access to the technology is limited and requires disclosure of how it will be used. Microsoft explains:

A conversation with Bugs Bunny might feel real, but everyone knows that it isn't – because Bugs is a fictional character. That's an important distinction, and one that Microsoft is careful to protect in every application of the technology. That's a key reason Custom Neural Voice is limited access, meaning interested customers must apply and be approved by Microsoft to use the technology. In this case, general availability means it is ready for production and available in more Azure cloud regions, not that it is available to the general public.While many uses for Custom Neural Voice involve a fictional character, sometimes a customer wants the voice to be a real person, such as an author reading their own book. Even in those cases, it is important that people know the voice is synthetic, which is why Microsoft includes a disclosure requirement in its contract.

Another section of the blog post covers Microsoft's "commitment to responsibility" in regard to the technology:

As creators of this technology, we have an obligation to make sure it's used responsibly. We take responsible AI very seriously; it's one of our core tenets. And we're careful with the partners we work with in making sure they follow the guidelines.

Sean Endicott
News Writer and apps editor

Sean Endicott is a news writer and apps editor for Windows Central with 11+ years of experience. A Nottingham Trent journalism graduate, Sean has covered the industry’s arc from the Lumia era to the launch of Windows 11 and generative AI. Having started at Thrifter, he uses his expertise in price tracking to help readers find genuine hardware value.

Beyond tech news, Sean is a UK sports media pioneer. In 2017, he became one of the first to stream via smartphone and is an expert in AP Capture systems. A tech-forward coach, he was named 2024 BAFA Youth Coach of the Year. He is focused on using technology—from AI to Clipchamp—to gain a practical edge.