Table of Contents
Celebrating ChatGPT’s First Anniversary
As ChatGPT approaches its one-year milestone, OpenAI continues to enhance its capabilities, offering users more intelligent interactions.
OpenAI’s Latest Enhancements
In just under a year since its launch, OpenAI has consistently introduced new features to improve ChatGPT. Now, the company has unveiled two significant additions that promise to make ChatGPT even smarter.
Voice and Image Capabilities Unveiled
OpenAI’s recent blog post announces the integration of voice and image capabilities into ChatGPT, revolutionizing its interface and user experience.
A More Intuitive Interface
OpenAI explains that these new capabilities provide a more intuitive way to interact with ChatGPT. Users can now engage in voice conversations and share images to enhance their interactions with the AI.
Image Recognition for Everyday Tasks
ChatGPT users can now snap pictures of their fridge and pantry to determine what ingredients are available, and even ask for step-by-step recipes based on the available items.
Rollout Details
OpenAI plans to roll out these features to Plus and Enterprise users over the next two weeks. Voice capabilities will be available on iOS and Android, with the option to enable them in settings. Image capabilities will be accessible on all platforms.
Engaging with Voice Prompts
Users can activate ChatGPT with simple voice prompts and engage in seamless back-and-forth conversations with the AI assistant. OpenAI has powered this voice capability with a new text-to-speech model that can generate human-like audio from text and a short sample of speech.
Professional Voice Actors
OpenAI collaborated with professional voice actors to create distinct voices for the voice capability. Additionally, they employ their open-source speech recognition system, Whisper, to transcribe spoken words into text.
Showing Images for Responses
Users can now share images with ChatGPT for more context-rich responses. To focus on specific parts of an image, the mobile app includes a drawing tool. Image understanding is made possible through multimodal models like GPT-3.5 and GPT-4, which apply their language reasoning skills to a wide array of visual content, including photographs, screenshots, and documents containing text and images.
These new voice and image capabilities signify another step forward in the evolution of ChatGPT, enhancing its versatility and making it even more indispensable in various applications.