Introduction

In today’s fast-paced world, where convenience and efficiency are paramount, OpenAI’s latest addition to ChatGPT, the Advanced Voice Mode, sets a new standard for human-computer interaction.

Simply, ChatGPT’s Advanced Voice Mode is a powerful new feature that enables real-time, natural voice interactions. Unlike the previous version, it natively understands speech without needing to transcribe it into text first. This reduces delays and allows for a more fluid and authentic conversation. Users no longer need to worry about speaking loudly or clearly, as the advanced model can capture nuances and respond more naturally. The experience is faster, more accurate, and feels more like talking to a real person, enhancing user engagement and satisfaction.

The Technology Behind It

Two major technologies make Advanced Voice Mode possible:

1. Automatic Speech Recognition (ASR)

At the core of Advanced Voice Mode is Automatic Speech Recognition (ASR), which directly processes your voice to understand your words. This fast, accurate system adapts to various accents and speaking speeds, ensuring a smooth interaction.

2. Text-to-Speech (TTS)

Once your query is processed, Text-to-Speech (TTS) generates natural-sounding responses. With multiple voice options available, the voices are expressive and clear, enhancing the overall user experience.

Together, these technologies enable fluid, lifelike conversations, almost as if you’re interacting with a real assistant.

Why OpenAI Improved Voice Mode

The old ChatGPT Voice Mode had several drawbacks that made interactions less smooth and enjoyable. First, it involved multiple steps: your speech was transcribed into text, processed by the language model, and then converted back into speech. This caused delays, increasing the chance of misunderstandings — like missing nuances in tone or speech clarity. Users had to speak loudly and avoid pauses to prevent interruptions. These extra steps created a somewhat rigid and unnatural experience, as users had to adapt to the limitations of the model instead of having a relaxed, flowing conversation.

In contrast, the new Advanced Voice Mode natively understands speech, removing these extra steps. This creates a smoother, more fluid interaction, making conversations feel natural and responsive. The improvement significantly reduces latency and enhances the overall conversational flow, making the experience much more relaxed and intuitive for users.

How to Start an Advanced Mode Voice Conversation?

To start a voice conversation, we can only use the feature on mobile devices since Advanced Voice Mode is not yet available on desktop. Additionally, it’s exclusive to ChatGPT subscribers.

Select the Voice icon in the bottom-right corner:

When you begin an advanced voice chat, you’ll see a screen with a blue orb:

User interface of ChatGPT advanced voice mode

Please note that conversations using standard voice have a black circle in the centre.

User interface of old ChatGPT voice chat

Once everything is set up, you’re ready to start using the new voice chat. If it’s your first time using advanced voice, you’ll be prompted to choose a voice. You can change your voice anytime in the settings, or within the voice mode using the customisation menu in the top-right corner.

What Sets Advanced Voice Mode Apart from Previous Versions?

Voice Variety and Realism: The new voice mode features five new voices, all created in collaboration with professional voice actors. These voices offer a more human-like, natural sound compared to the old version. For example, users can now choose from a range of voices that vary in tone and accent.

There are nine lifelike output voices for ChatGPT, each with its own distinct tone and character:

Arbor — Easygoing and versatile
Breeze — Animated and earnest
Cove — Composed and direct
Ember — Confident and optimistic
Juniper — Open and upbeat
Maple — Cheerful and candid
Sol — Savvy and relaxed
Spruce — Calm and affirming
Vale — Bright and inquisitive

These voices sound highly realistic and can even mimic the subtle noises or pauses that naturally occur in human speech. This significantly enhances the user experience, making interactions more lifelike and immersive.

Improved Accent Support: The new version includes better support for various accents, allowing for more localised and natural-sounding speech. For instance, it can better handle regional accents compared to the older version, which was less adaptive.

— The new voice mode does a better job of mimicking different accents, such as Italian-accented or Russian-accented English. I was pleasantly surprised to discover that ChatGPT can also imitate regional accents in other languages. For example, in Chinese, it can mimic accents like Beijing, Henan, or Sichuan dialects. This versatility further enhances the realism and personalisation of the user experience.

Multilingual Capability: The new voice mode supports over 50 languages, such as saying “Sorry I’m late” in different languages. The old version had limited multilingual support, making this a significant improvement.
Smoother Conversations: The new text-to-speech engine offers smoother, more fluid conversations. Compared to the older version, the transitions between sentences and phrases are now more natural and less robotic.

— The old voice chat was slow to respond, especially with complex questions, requiring a long wait for answers. In contrast, the new voice mode offers almost real-time responses and allows you to interrupt at any moment to ask new questions. The interaction feels much smoother and more seamless, greatly enhancing the overall user experience. With the new voice chat, users no longer need to speak loudly like they did with the old version. Even if you speak softly, the new version can accurately recognise your voice, making the experience more convenient and natural.

Enhancing User Experience with Advanced Voice Mode

So, how does Advanced Voice Mode improve the user experience? Here are some key enhancements:

1. Increased Productivity

For busy users, this mode allows you to perform tasks hands-free, like requesting updates, dictating emails, or scheduling meetings without pausing to type. It’s perfect for multitasking.

2. Greater Accessibility

The feature opens up new ways for users with disabilities to interact with ChatGPT, making it easier to navigate without relying on screens or keyboards.

3. Engaging Learning and Entertainment

Whether practising languages or asking complex questions, Advanced Voice Mode brings more immersion and ease to learning and entertainment.

Drawbacks of New Voice Chat

Here are some potential drawbacks of the new Advanced Voice Mode:

Usage Limits: For Plus and Team users, the Advanced Voice Mode has a daily time limit, which may change. Users receive notifications as they near the limit, with a final reminder when they have 15 minutes left for the day. Free users can access a monthly preview to try the feature. It’s a bit disappointing that paid users only have limited minutes of usage per day for Advanced Voice Mode.

2. Pronunciation Issues: Although it supports multiple languages, some of the accents or pronunciations in non-English languages can sound unnatural or inaccurate. Although the English pronunciation is excellent, when it comes to other languages like Chinese, I noticed that a few words sound somewhat unnatural. It’s relatively easy to tell that the voice isn’t completely human in those cases.

3. Separate Window for Advanced Mode: You can’t use Advanced Voice Mode directly in the chat window; instead, you need to open a new chat window to start using it.

Conclusion

In summary, ChatGPT’s Advanced Voice Mode offers a significant leap forward in voice interaction, providing smoother, more realistic conversations with enhanced accent and multilingual support. It greatly improves productivity, accessibility, and overall user engagement. However, the feature does come with limitations such as daily usage caps for subscribers, some pronunciation issues in non-English languages, and the need to use a separate window for voice interactions. Despite these drawbacks, the improvements in user experience make it a valuable tool for efficient and immersive communication.

References

Dan Shipper. “Review: ChatGPT’s New Advanced Voice Mode”. Every, 9 Aug. 2024. Read me). Accessed on 10 Oct 2024.
OpenAI Team — Voice mode FAQ Read me.
Jordan Novet. “OpenAI just launched advanced voice mode for audio chats with ChatGPT. Here’s how to use it”. CNBC, 25 Sep. 2024. Read me. Accessed on 10 Oct 2024.
Reece Rogers. “I Used ChatGPT’s Advanced Voice Mode. It’s Fun, and Just a Bit Creepy”. WIRED, 13 Aug 2024. Read me. Accessed on 10 Oct 2024.