According to a report by The Information, insiders have revealed that OpenAI plans to launch a multi-modal AI system named GPT-Vision, competing with Google's recently released multi-modal large model Gemini for enterprise testing. When OpenAI released GPT-4 in March, it previewed multi-modal capabilities but has only made them available to a select few businesses so far. Six months later, OpenAI is preparing to roll out GPT-Vision on a broader scale. The delay was mainly due to OpenAI's concern about potential misuse of the new features. Additionally, OpenAI is developing a more powerful multi-modal model codenamed Gobi. OpenAI's proactive push for the commercial application of multi-modal AI marks the entry of multi-modal AI into practical application stages. Industry insiders believe that visual capabilities such as image generation will enhance the commercial value of AI systems, and OpenAI's GPT-Vision has the potential to rival Google. The competition between the two major giants in the AI field is conducive to technological advancement.
OpenAI's Multimodal AI System GPT-Vision Set to Launch, Competing with Google Gemini

站长之家
This article is from AIbase Daily
Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.