Vary-toy

A miniature language model combined with enhanced visual vocabulary

CommonProductImageMiniature ModelVisual Vocabulary

Vary-toy is a miniature Vary model based on Qwen-1.8B as the underlying 'large' language model. Vary-toy incorporates an improved visual vocabulary, enabling the model to possess all the characteristics of Vary and exhibit broader generalization capabilities. Specifically, in the process of generating visual vocabulary, we replace negative samples from natural images with positive samples driven by object detection, fully utilizing the capacity of the vocabulary network to efficiently encode visual information corresponding to natural objects. In experiments, Vary-toy achieved 65.6% ANLS on DocVQA, a 59.1% accuracy on ChartQA, an 88.1% accuracy on RefCOCO, and a 29% accuracy on MMVet. Pricing: Free trial available, paid version price to be determined. Positioning: Providing researchers with a solution to train and deploy LVLMs on ordinary GPUs under limited resources.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Deployment Calculator

AI Dataset Collection

Intelligent Document Recognition

Vary-toy

Vary-toy Visit Over Time

Vary-toy Visit Trend

Vary-toy Visit Geography

Vary-toy Traffic Sources

Vary-toy Alternatives

Vary-toy — A miniature language model combined with enhanced visual vocabulary

Visual Anagrams — Visual illusions are created using a pre-trained diffusion model.

WordSea — Learn vocabulary through visual aids

Vary — Visual Vocabulary Expansion for Large-Scale Visual Language Models

InternVL — Open Source Visual Basic Model

Visual Sketchpad — A visual reasoning tool for multimodal large language models (LLMs)

Word Genie — An intelligent vocabulary search tool

Kimi Visual Thinking Model K1 — A visual thinking model based on reinforcement learning technology, leading the industry in scientific testing.

Qwen-VL — General-purpose Visual Language Model

MouSi — Multimodal Visual Language Model

CogVLM — A powerful open-source visual language model

Segment Anything Model 2 — A foundational model for visual segmentation of images and videos.

moondream — A powerful small visual language model, accessible everywhere.

MixRead — Boost your English vocabulary and achieve gradual and sustainable vocabulary growth.

YOLO-World — Real-time open vocabulary object detection

QVQ-72B-Preview — Experimental research model with enhanced visual reasoning capabilities

VMamba — Visual state-space model with linear complexity and global perception.

Florence-2-large — An advanced vision foundation model that supports various visual and visual-language tasks

Quran Stories — Learn about the Quran through stories and learn new vocabulary from the Quran!

Florence-2-base — An advanced visual foundation model that supports various visual and vision-language tasks.

Aria-UI — A multimodal model for visual localization of GUI commands.

New Interpretations of Chinese — Provides fresh perspectives on Chinese vocabulary interpretations

InternLM-XComposer-2.5 — A Multifunctional Large Visual Language Model

InternVL2_5-26B — A large multimodal language model that integrates visual and linguistic understanding.

OpenGVLab InternVL — An AI visual language model providing image analysis and description services.

Relingo — Intelligent bilingual translation, helps with vocabulary retention

Mistral-7B-v0.3 — A large language model with an expanded vocabulary.

Pali3 — PaLI-3 Visual Language Model: Smaller, Faster, Stronger

Elia — Expand your English vocabulary while browsing the web.

SmolVLM — An efficient open-source visual language model

GEO Services