Qwen-SEA-LION-v4-4B-VL is a vision-language model with 4 billion parameters built on the Qwen3-VL-4B-Instruct architecture. It has been specifically instruction fine-tuned for the Southeast Asian region, possessing multilingual and multicultural capabilities, supporting English and seven Southeast Asian languages, and retaining strong vision-language understanding capabilities.
Multimodal
TransformersMultiple Languages