OpenAI's GPT-4o model emulates the user’s voice in a noisy background because it gets confused, but the issue has been mitigated at a "system level"

ChatGPT on Android
ChatGPT on mobile (Image credit: Shutterstock)

What you need to know

  • OpenAI recently released its Advanced Voice Mode feature to select ChatGPT Plus subscribers to gather feedback and improve the user experience.
  • The ChatGPT maker recently published a blog post highlighting observed risks affecting GPT-4o's performance and mitigation measures it is using to address privacy and security concerns.
  • Amid the mass exodus of top executives from OpenAI's safety and super alignment team, the company has seemingly made safety its priority again while shiny products take a back seat. 

OpenAI's GPT-4o's launch in May contributed to the biggest spike ever in ChatGPT's revenue and downloads on mobile and continues to perform well with $28 million in revenue in July. These figures might get better, especially after the ChatGPT maker launched the long-awaited Advanced Voice Mode feature.

At launch, OpenAI indicated it delayed the feature's launch by one month to ensure it meets the set threshold and security standards. It's worth noting that the feature's accessibility is currently limited to select ChatGPT users and buried behind the $20 Plus subscription. OpenAI says the feature's limitation to a small group of users is designed to help the company gather feedback and expand its capabilities.

The ChatGPT maker recently published a new blog post highlighting observed safety challenges facing its Advanced Voice Mode and the elaborate measures it's taking to mitigate the issues. Unauthorized voice generation using Advanced Voice Mode is a major concern for OpenAI. The company says the model is restricted to "pre-selected voices." It will also leverage an output classifier to detect when the model veers off the rails. 

There are issues but OpenAI is working on them

Introducing GPT-4o - YouTube Introducing GPT-4o - YouTube
Watch On

OpenAI admits GPT-4o may fall off the rails and do things it's not supposed to. For instance, the company says the model emulates a user's voice when in a noisy environment. It further indicates this odd occurrence happens because the model struggles to understand the prompt due to the background noise.

It's worth noting that this issue no longer riddles the model. While speaking to TechCrunch, an OpenAI spokesman indicated that the company has since added a “system-level mitigation” to GPT-4o to prevent the reoccurrence of the annoying issue.

Another prevalent issue is speaker identification, which draws back the line to the safety and privacy issues around AI. OpenAI says that the model has been trained to decline requests to identify someone based on a voice in an audio output. However, it can identify people associated with famous quotes.

OpenAI and Microsoft have been under fire multiple times over the past few years for copyright infringement. Microsoft Copilot and ChatGPT have been spotted stealing content from publications without compensation or attribution.

The same issues were also identified in GPT-4o. OpenAI says the model is now trained to decline requests for copyrighted content across audio and more. According to OpenAI:

"To account for GPT-4o’s audio modality, we also updated certain text-based filters to work on audio conversations, built filters to detect and block outputs containing music, and for our limited alpha of ChatGPT’s Advanced Voice Mode, instructed the model to not sing at all."

Safety is seemingly becoming a core priority for companies like OpenAI and Microsoft. It's interesting to see them address critical issues impacting flagship AI models before shipping them to broad availability, which could lead to major privacy and safety issues.

🔥The hottest trending deals🔥

Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.