Recently, Baidu has taken another important step in the field of artificial intelligence and officially released its latest multimodal thinking model - ERNIE-4.5-VL-28B-A3B-Thinking. This new model not only has powerful language processing capabilities, but also introduces an innovative "image thinking" feature, meaning it has made significant improvements in understanding and processing images.

According to Baidu's introduction, the ERNIE-4.5-VL model uses only 3B activation parameters, demonstrating excellent computing efficiency and flexibility. This design allows the model to respond quickly and maintain efficiency when handling various tasks, fully meeting the growing demand for AI applications.

More notably, Baidu has added an "image thinking" feature to this model. Through this innovative capability, ERNIE-4.5-VL can not only enlarge images but also perform image search and other tool calls. Such technological breakthroughs will greatly enrich users' interaction experiences between images and text, providing new possibilities for applications in areas such as intelligent search, online education, and e-commerce.

In the context of rapid development in AI technology, Baidu continues to demonstrate its leadership in the multimodal AI field through ERNIE-4.5-VL. With the release of this model as open source, developers and researchers can more conveniently explore the potential of multimodal AI, promoting the development and application of related technologies.

The release of ERNIE-4.5-VL-28B-A3B-Thinking is not only an important technological innovation for Baidu, but also marks a new chapter in multimodal artificial intelligence. We look forward to seeing this technology play a greater role in various industries, helping people process information and solve problems in a smarter way.