On Tuesday, OpenAI began rolling out its much-anticipated Advanced Voice Mode for ChatGPT, specifically powered by the new GPT-4o model. This feature is initially available to a select group of ChatGPT Plus users, with plans for a broader rollout to all Plus users by the fall of 2024.
When OpenAI first showcased GPT-4o’s voice capabilities in May 2024, audiences were amazed by its lifelike quality. The voice, dubbed “Sky,” bore a striking resemblance to Scarlett Johansson. This resemblance sparked controversy, leading Johansson to seek legal counsel. OpenAI denied using her voice, eventually removing Sky from its lineup and reiterating that their voices were developed with professional voice actors through a rigorous selection process.
What Sets GPT-4o Apart?
The new Advanced Voice Mode represents a significant leap from previous versions. Unlike the earlier models, which required separate systems to convert voice to text and vice versa, GPT-4o is a multimodal model. It integrates these processes, resulting in faster and more seamless interactions. This capability allows the model to understand and respond with emotional intonations, enhancing the natural feel of conversations.
Safety and Ethical Considerations
Despite the technological advancements, OpenAI has taken a cautious approach to the rollout. The release was delayed to ensure robust safety measures were in place, particularly to prevent misuse and to handle content responsibly. This includes blocking outputs that could mimic real people’s voices without authorization and filtering requests for generating copyrighted or inappropriate content.
Voices and User Experience
The initial release of Advanced Voice Mode includes four preset voices—Juniper, Breeze, Cove, and Ember—created in collaboration with voice actors. Each voice was chosen to embody characteristics such as approachability, warmth, and confidence, ensuring a pleasant and engaging user experience. OpenAI’s collaboration with award-winning casting directors and producers highlights their commitment to high-quality and ethical voice generation.
Future Developments
While the current alpha release focuses on voice interactions, OpenAI has plans to introduce additional capabilities, including video and screen-sharing, in the future. The goal is to create a more comprehensive and interactive AI assistant that can handle complex tasks and provide real-time responses with minimal latency.
Stay Updated on AI news…