The Qingdao Virtual Intelligent Agent Industry Conference officially launched China's first visual language large model, VisualGPT, which supports full-modal real-time interaction, and simultaneously launched an intelligent agent training platform. It offers multi-modal data and computing resources to developers nationwide, marking a new stage in AI interaction from "text-based conversation" to "visual interface instant interaction."

Model Highlights  

- Full-modal Real-time Interaction: After users upload images/videos, they can directly select, annotate, or ask questions through voice on the screen. The model returns structured answers, executable code, or 3D scenes in seconds without switching to text input.  

- What You See Is What You Get: VisualGPT couples a visual encoder with a streaming decoder, achieving end-to-end latency of less than 300ms, supporting real-time analysis of 1080p60fps video and multi-turn dialogue.  

- Multi-domain Applications: The official has already opened SDKs in three scenarios: education, healthcare, and finance. Teachers can circle formulas on presentations to instantly generate animated explanations, doctors can ask about lesion indicators while reviewing images, and financial analysts can directly ask questions about stock charts to get strategy backtesting results.

Platform Support  

The Qingdao Intelligent Agent Training Platform provides 1000 A100/H100 mixed computing power and 10PB multi-modal data, offering free access for enterprises, universities, and individual developers. It plans to expand to 5000 H100 GPUs by 2026, building the largest AI training cluster in northern China. The conference also released a "100 Enterprises and 100 Scenarios" connection list, collecting over 200 visual interaction demands in the first round, expecting to complete 100 benchmark cases by the end of 2025.

Industrial Significance  

The launch of VisualGPT has given Qingdao an early advantage in the virtual intelligent agent field. According to the city's Industrial and Information Technology Bureau, in the next three years, it will rely on this model to build the "Qingdao AI Innovation Valley," attracting more than 300 upstream and downstream enterprises, forming a trillion-yuan full-modal interactive industry chain.