Vary-toy is a miniature Vary model based on Qwen-1.8B as the underlying 'large' language model. Vary-toy incorporates an improved visual vocabulary, enabling the model to possess all the characteristics of Vary and exhibit broader generalization capabilities. Specifically, in the process of generating visual vocabulary, we replace negative samples from natural images with positive samples driven by object detection, fully utilizing the capacity of the vocabulary network to efficiently encode visual information corresponding to natural objects. In experiments, Vary-toy achieved 65.6% ANLS on DocVQA, a 59.1% accuracy on ChartQA, an 88.1% accuracy on RefCOCO, and a 29% accuracy on MMVet. Pricing: Free trial available, paid version price to be determined. Positioning: Providing researchers with a solution to train and deploy LVLMs on ordinary GPUs under limited resources.