Microsoft's Bing team recently announced the open source of its latest word embedding model "Harrier". This model performed well in the multilingual MTEB v2 benchmark test, supporting over 100 languages, providing users with more powerful language processing capabilities. The training data of Harrier includes over 2 billion examples and synthetic data from GPT-5, and it uses a context window of 32,000 tokens, making it more accurate and flexible in multilingual tasks.

In terms of parameter configuration, Harrier has a complete version with 2.7 billion parameters, and also launched two smaller versions, with 60 million and 270 million parameters respectively, aiming to provide feasible solutions for users with low-performance hardware. These three models have been released on the Hugging Face platform under the MIT license, making it convenient for developers to use and integrate.
Embedding models play a crucial role in artificial intelligence systems, especially in tasks such as search, information retrieval, and data organization. With the development of AI technology, the demand for embedding models is increasing, helping AI agents to independently handle more complex multi-step tasks. Therefore, Microsoft stated that the release of Harrier will promote the application of AI technology in various fields.
In the future, Microsoft plans to integrate Harrier technology into the Bing search engine and into the infrastructure of the next generation of AI agents. This strategy will further enhance Bing's competitiveness in the AI field and meet users' needs for efficient information processing.
Key points:
🌍 Harrier model supports over 100 languages and has strong multilingual processing capabilities.
💡 The model was trained using over 2 billion examples and GPT-5 data, ensuring high accuracy.
🚀 Microsoft plans to integrate Harrier into Bing and the next generation of AI agent services, improving the performance of the search engine.



