New Multilingual Encoder mmBERT: Enhancing Speed and Efficiency Beyond XLM-R

AIbase基地

Published inAI News · 5 min read · Sep 11, 2025

Recently, a team of researchers from Johns Hopkins University launched mmBERT, a new multilingual encoder designed to fill the gap in the current multilingual natural language processing field. The model outperforms XLM-R on multiple tasks and is 2 to 4 times faster than previous models, providing stronger support for research and development of multilingual applications.

The architecture of mmBERT consists of two main configurations: the base model and the small model. The base model has 22 transformer layers, with a hidden layer dimension of 1152 and approximately 307 million total parameters, while the small model has 140 million parameters. mmBERT uses an advanced Gemma2 tokenizer, supporting a vocabulary size of 256k, and utilizes rotary position embeddings (RoPE) and FlashAttention2 technology, significantly improving processing efficiency. Additionally, the sequence length has been extended from 1024 tokens to 8192 tokens, meaning it can process longer context information.

In terms of training data, mmBERT used 3 trillion tokens from multiple sources, covering 1833 languages. English accounted for only 10% to 34% of the entire corpus. The training was divided into three stages: pre-training, mid-training, and decay stage. In each stage, the model gradually encounters more languages and higher-quality data, which helps improve the performance of low-resource languages.

mmBERT demonstrates outstanding performance on multiple benchmark tests. In the English natural language understanding (GLUE) task, the base model of mmBERT scored 86.3, surpassing XLM-R's 83.3. In the multilingual natural language understanding (XTREME) task, mmBERT scored 72.8, also exceeding XLM-R's 70.4. Moreover, mmBERT performed well in embedding tasks and code retrieval tasks, showing its potential in various application scenarios.

By focusing particularly on low-resource languages, mmBERT ensures that these languages are fully utilized during the training process. In multiple benchmark tests, mmBERT's performance on low-resource languages such as Faroese and Tigrinya outperformed other large models, proving that encoder models can effectively tackle the challenges of low-resource scenarios after careful training.

mmBERT not only improves the speed and efficiency of multilingual processing but also lays a solid foundation for the next generation of multilingual natural language processing systems. It redefines the potential of multilingual encoders in an efficient and open manner, marking the arrival of a new era.

github: https://github.com/JHU-CLSP/mmBERT?tab=readme-ov-file

Key Points:
🌍 mmBERT model outperforms XLM-R on multiple tasks, becoming the new benchmark for multilingual NLP.
⚡ The model is 2 to 4 times faster and supports input up to 8192 tokens.
📊 mmBERT pays special attention to the training performance of low-resource languages, demonstrating strong adaptability and wide application potential.

Perplexity Successfully Raises $2 Billion in Funding, Valuation Surges to $20 Billion

The AI-driven search startup Perplexity recently secured $2 billion in new funding, with its valuation reaching $20 billion. This funding news came just two months after the company's valuation was at $18 billion. Perplexity, which was founded three years ago, has now raised a total of $15 billion in funding. Image source note: The image is generated by AI, and the image licensing service provider is Midjourney. It is reported that the specific lead investor of this funding round has not been disclosed, but according to Bloomberg's report, Per

Microsoft's 14B Parameter Model Challenges a 671B Giant AI Agent: Reinforcement Learning Redefines Mathematical Reasoning

The rStar2-Agent model developed by Microsoft Research has attracted attention in the field of AI mathematical reasoning. This 14 billion parameter model surpasses the DeepSeek-R1 model, which has 671 billion parameters, in multiple mathematical benchmark tests through innovative agent reinforcement learning technology. The core innovation of rStar2-Agent lies in abandoning the traditional chain-of-thought method and adopting an agent interaction mechanism. The model can autonomously plan the reasoning process, use Python code execution tools for verification, and adjust based on feedback.

Microsoft Launches New AI Agent Model rStar2-Agent with 14 Billion Parameters to Challenge Large Models

Microsoft has made significant breakthroughs in the AI field, open-sourcing an AI Agent inference model called rStar2-Agent. This model uses an innovative agent reinforcement learning method. Surprisingly, despite having only 14 billion parameters, it achieved an accuracy of 80.6% in the AIME24 math reasoning test, surpassing DeepSeek-R1, which has 671 billion parameters (79.8%). Such performance has led people to reevaluate the relationship between model size and performance.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

New Multilingual Encoder mmBERT: Enhancing Speed and Efficiency Beyond XLM-R

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Perplexity Successfully Raises $2 Billion in Funding, Valuation Surges to $20 Billion

Breakthrough! Moonshot Open-Sources Revolutionary Middleware Checkpoint Engine, Bringing New Vitality to LLM Inference Engines!

First Statement After the $2 Billion Seed Round! Mira Murati's Mysterious Lab Challenges AI Randomness, Determined to Make Machine Thinking Predictable

UAE Launches the World's Fastest Open Source AI Model K2 Think with 32 Billion Parameters

Tencent Opensources HunyuanImage 2.1! 2K High-Definition Amazing Images Generated in Seconds, Precise Control over Multiple Subjects with Complex Prompts - AI Design Efficiency Skyrockets?

Tencent Upgrades Huan Yuan Image Model 2.1, Supports Writing and 2K Resolution

Microsoft's 14B Parameter Model Challenges a 671B Giant AI Agent: Reinforcement Learning Redefines Mathematical Reasoning

Microsoft Launches New AI Agent Model rStar2-Agent with 14 Billion Parameters to Challenge Large Models

Moonshot AI Releases Kimi K2-0905: High-speed API Supporting 60-100 Tokens/s Now Fully Opened

Microsoft officially launches GPT-realtime model, focusing on more realistic voice and multimodal input

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

New Multilingual Encoder mmBERT: Enhancing Speed and Efficiency Beyond XLM-R

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Perplexity Successfully Raises $2 Billion in Funding, Valuation Surges to $20 Billion

Breakthrough! Moonshot Open-Sources Revolutionary Middleware Checkpoint Engine, Bringing New Vitality to LLM Inference Engines!

First Statement After the $2 Billion Seed Round! Mira Murati's Mysterious Lab Challenges AI Randomness, Determined to Make Machine Thinking Predictable

UAE Launches the World's Fastest Open Source AI Model K2 Think with 32 Billion Parameters

Tencent Opensources HunyuanImage 2.1! 2K High-Definition Amazing Images Generated in Seconds, Precise Control over Multiple Subjects with Complex Prompts - AI Design Efficiency Skyrockets?

Tencent Upgrades Huan Yuan Image Model 2.1, Supports Writing and 2K Resolution

Microsoft's 14B Parameter Model Challenges a 671B Giant AI Agent: Reinforcement Learning Redefines Mathematical Reasoning

Microsoft Launches New AI Agent Model rStar2-Agent with 14 Billion Parameters to Challenge Large Models

Moonshot AI Releases Kimi K2-0905: High-speed API Supporting 60-100 Tokens/s Now Fully Opened

Microsoft officially launches GPT-realtime model, focusing on more realistic voice and multimodal input

GEO Services