OpenAI’s ChatGPT Can Now Speak, Listen, and See

BY Keisha Oleaga

September 25, 2023

ChatGPT can now respond, listen, and to some extent, see you.

The way we interact with artificial intelligence is set to reach new heights with OpenAI’s latest announcement. Today, the artificial intelligence start-up unveiled a major upgrade for the ChatGPT mobile app on both iOS and Android platforms. With this update, users can now engage in voice conversations with the chatbot, enabling them to speak their queries and receive responses in ChatGPT’s own synthesized voice.

ChatGPT’s enhanced version also boasts impressive visual capabilities, allowing users to upload or capture photos within the app. In response, the AI not only describes the image but also provides more extensive context.

Let’s break down the new features step-by-step:

Voice Conversations with ChatGPT

Imagine having a conversation with your AI assistant as you would with a friend. With ChatGPT’s new voice capability, this becomes a reality. You can engage in fluid, back-and-forth conversations with ChatGPT using your voice. Whether you’re on the go, looking for a bedtime story for your family, or trying to settle a dinner table debate, ChatGPT is ready to chat.

How to Get Started with Voice Conversations:

  1. Update Your App: Make sure you have the latest version of the ChatGPT mobile app.
  2. Opt-in for Voice: Head to the ‘Settings’ menu within the app and select ‘New Features.’ Opt into voice conversations.
  3. Choose Your Voice: Once you’ve enabled voice, tap the headphone button in the top-right corner of the home screen. You can select your preferred voice from five different options.

The incredible realism of ChatGPT’s voices is powered by a text-to-speech model, with each voice crafted by professional actors. Additionally, Whisper, OpenAI’s open-source speech recognition system, transcribes your spoken words into text.

Image Recognition

ChatGPT’s new image recognition feature takes AI interaction to a whole new level. You can now show ChatGPT one or more images, and it will help you analyze, discuss, and gather insights from them. This feature has a myriad of practical applications, from troubleshooting technical issues to planning your next meal.

How to Start Chatting About Images:

  1. Capture or Choose an Image: Tap the photo button to either capture a new image or select an existing one. On iOS and Android, tap the plus button first to add images.
  2. Discuss and Annotate: You can discuss multiple images or even use the drawing tool within the mobile app to guide ChatGPT’s understanding of the image.

This image understanding feature is powered by the multimodal capabilities of GPT-3.5 and GPT-4. These models apply their language reasoning skills to a wide range of images, including photographs, screenshots, and documents containing both text and visuals.

Safety and Privacy Concerns

In a statement, OpenAI acknowledges the safety and privacy concerns around the new update. They shared their “dedication to ensuring the responsible use of these advanced AI capabilities”. Voice technology and image recognition bring great potential but also new challenges. OpenAI is actively working on risk mitigation strategies and is collaborating with various partners to promote responsible AI use.

For voice technology, ChatGPT focuses on voice chat applications, working closely with voice actors to maintain authenticity and prevent misuse. Notably, Spotify is leveraging this technology for voice translation in podcasts.

In terms of image input, OpenAI is taking steps to limit ChatGPT’s ability to analyze and make statements about individuals, prioritizing user privacy and safety.

This update follows last week’s announcement by OpenAI about the integration of DALL-E 3, their latest image-generation model, with ChatGPT. This integration will enable users to generate images using the chatbot.

