Singapore's Artificial Intelligence Singapore (AISG) has released the next generation large language model Qwen-Sea-Lion-v4, with its underlying technology fully switched from Meta Llama to Alibaba's "Qwen3-32B", and it has ranked first in the open-source list with less than 200 billion parameters on the Southeast Asian Language Comprehensive Evaluation Benchmark (Sea-Helm).

Reasons for the Switch 

- Language Adaptation: Llama performs poorly on low-resource languages such as Indonesian, Thai, and Malay; Qwen3 is pre-trained on 119 languages/dialects, covering 36 trillion tokens, with a native multilingual architecture that lowers the threshold for subsequent training  

- Tokenization Optimization: The new model discards the Western commonly used "sentence tokenizer" and adopts Byte Pair Encoding (BPE), which can split Thai and Burmese characters without spaces, significantly improving translation accuracy and reasoning speed  

- Compute-Friendly: After quantization, the model can run on consumer-grade laptops with 32GB of memory, fitting the scenario of limited computing power in many small and medium-sized enterprises in Southeast Asia

Training Data  

AISG contributed 100 billion Southeast Asian language tokens, with a content concentration of 13%, 26 times that of Llama2; Alibaba used "advanced post-training" to inject regional knowledge, enabling the model to better understand mixed languages such as Singaporean English and Malaysian English.

Performance Results  

The Sea-Helm ranking shows that Qwen-Sea-Lion-v4 leads the original Llama baseline by an average of 8.4% in tasks for Indonesian, Vietnamese, Thai, and Malay languages. It also ranks first in document-level reasoning and cross-language summary metrics.

Open Source and Implementation  

The model is now freely available for download on Hugging Face and the AISG official website, offering 4/8-bit quantized versions; the Singapore government has included it in the national multimodal program worth 70 million Singapore dollars launched in 2023, and it is expected to be widely deployed in education, healthcare, and finance sectors by 2026.