AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation MCP

Release of the Next-Generation Open Source Vision Encoder OpenVision: A Powerful Alternative Beyond CLIP and SigLIP

AIbase基地

Published inAI News · 6 min read · May 13, 2025

UC Santa Cruz recently announced the release of OpenVision, a brand-new series of visual encoders designed to provide alternatives to models like OpenAI's CLIP and Google's SigLIP. The introduction of OpenVision offers developers and enterprises more flexibility and choices, making image processing and understanding more efficient.

What are Visual Encoders?

Visual encoders are AI models that convert visual materials (typically uploaded static images) into numerical data that can be understood by other non-visual models (such as large language models). Visual encoders serve as a critical bridge connecting image and text understanding, enabling large language models to identify themes, colors, positions, and other features in images for more complex reasoning and interaction.

Key Features of OpenVision

1. **Diverse Model Options**

OpenVision provides 26 different models with parameter sizes ranging from 5.9 million to 632 million. This diversity allows developers to choose suitable models based on specific application scenarios, whether it’s identifying images in construction sites or providing troubleshooting guidance for household appliances.

2. **Flexible Deployment Architecture**

OpenVision is designed to adapt to various usage scenarios. Larger models are suitable for server-level workloads, requiring high accuracy and detailed visual understanding, while smaller variants are optimized for edge computing, suitable for environments with limited computation and memory. Additionally, models support adaptive patch sizes (8×8 and 16×16), allowing flexible trade-offs between detail resolution and computational load.

3. **Outstanding Multimodal Benchmark Performance**

In a series of benchmark tests, OpenVision performed excellently on various visual-language tasks. Although OpenVision's evaluation still includes traditional CLIP benchmarks (such as ImageNet and MSCOCO), the research team emphasized that these metrics should not be solely relied upon to assess model performance. They recommend adopting broader benchmark coverage and open evaluation protocols to better reflect real-world multimodal applications.

4. **Efficient Progressive Training Strategy**

OpenVision employs a progressive resolution training strategy where the model starts training on low-resolution images and gradually fine-tunes to higher resolutions. This method improves training efficiency, typically 2 to 3 times faster than CLIP and SigLIP, without sacrificing downstream performance.

5. **Optimized Lightweight Systems and Edge Computing Applications**

OpenVision also aims to effectively combine with small language models. In one experiment, the visual encoder was combined with a Smol-LM model with 1.5 million parameters, creating a multimodal model with an overall parameter count below 2.5 million. Despite its small size, this model maintained good accuracy in tasks such as visual question answering and document understanding.

The Importance of Enterprise Applications

OpenVision's comprehensive open-source and modular development approach holds strategic significance for enterprise technology decision-makers. It not only provides large language models with plug-and-play high-performance visual capabilities but also ensures the confidentiality of corporate proprietary data. Furthermore, OpenVision's transparent architecture enables security teams to monitor and evaluate potential vulnerabilities in the model.

OpenVision's model library is now available in PyTorch and JAX implementations and can be downloaded from Hugging Face. The training recipes have also been made public. By offering transparent, efficient, and scalable alternatives, OpenVision provides researchers and developers with a flexible foundation to drive the development of vision-language applications.

Project: https://ucsc-vlaa.github.io/OpenVision/

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team