Vary-toy
A miniature language model combined with enhanced visual vocabulary
CommonProductImageMiniature ModelVisual Vocabulary
Vary-toy is a miniature Vary model based on Qwen-1.8B as the underlying 'large' language model. Vary-toy incorporates an improved visual vocabulary, enabling the model to possess all the characteristics of Vary and exhibit broader generalization capabilities. Specifically, in the process of generating visual vocabulary, we replace negative samples from natural images with positive samples driven by object detection, fully utilizing the capacity of the vocabulary network to efficiently encode visual information corresponding to natural objects. In experiments, Vary-toy achieved 65.6% ANLS on DocVQA, a 59.1% accuracy on ChartQA, an 88.1% accuracy on RefCOCO, and a 29% accuracy on MMVet. Pricing: Free trial available, paid version price to be determined. Positioning: Providing researchers with a solution to train and deploy LVLMs on ordinary GPUs under limited resources.
Vary-toy Visit Over Time
Monthly Visits
25537072
Bounce Rate
44.24%
Page per Visit
5.9
Visit Duration
00:04:47