Welcome to the [AI Daily] section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. Valentine's Day Big News! Volcanic Engine's DouBao 2.0 will be released: Video generation directly reaches industrial-level delivery. ByteDance's Volcanic Engine plans to release multiple technology upgrades on February 14th, focusing on the DouBao product series.
Xiaomi opensources the 4.7 billion parameter robot model Xiaomi-Robotics-0, which adopts a MoT hybrid architecture. By enabling collaboration between the 'brain' and 'cerebellum', it achieves real-time inference on consumer-grade GPUs, solving the problem of sluggish actions caused by inference latency in existing VLA models, thus improving the efficiency and generalization ability of robot control.
The Beijing Humanoid Robot Innovation Center has recently completed a round of funding exceeding 700 million yuan, receiving strategic investments from several top-tier institutions and industry players such as Baidu, including the Beijing Artificial Intelligence Industry Investment Fund and Yizhuang Guotou. This marks the high recognition of the national-level platform by the capital market.
Alipay and Xiaomi have collaborated to launch a parking fee payment function on smart glasses. Users can pay through voice or eye interaction without using their phones. The feature is based on Ant Group's GPASS and AHA technologies, simplifying the parking process.
Xiaomi's first large inference model, MiMo, is open - sourced and designed specifically for inference tasks with excellent performance.
Integrates Xiaomi's AIoT speaker with ChatGPT to create a personalized smart home voice assistant.
A large-scale pre-trained language model developed by Xiaomi with a parameter scale of 64 billion.
Enterprise Intelligent Service Solution
XiaomiMiMo
The MiMo Embodied Model (MiMo-Embodied) is a powerful cross-embodied vision-language model that has demonstrated excellent performance in both autonomous driving and embodied AI tasks. It is the first open-source vision-language model that combines these two key domains, significantly enhancing the understanding and reasoning abilities in dynamic physical environments.
MiMo Audio is an audio language model developed by Xiaomi, which demonstrates strong few-shot learning ability through large-scale pre-training. This model breaks through the limitations of traditional models that rely on fine-tuning for specific tasks and performs excellently in tasks such as speech intelligence and audio understanding, reaching an advanced level among open-source models.
MiMo Audio is an audio language model based on large-scale pre-training, achieving SOTA performance among open-source models in speech intelligence and audio understanding benchmark tests. This model demonstrates strong few-shot learning ability and can generalize to tasks not included in the training data, supporting various audio tasks such as speech conversion, style transfer, and speech editing.
bartowski
This is a quantized version of XiaomiMiMo's MiMo-VL-7B-SFT-2508 model, optimized using llama.cpp to improve the model's running performance on specific hardware. This model is a vision-language model with 7 billion parameters, supporting image-to-text generation tasks.
This is the GGUF quantized version of the Xiaomi MiMo-VL-7B-RL-2508 model, quantized using the imatrix option of llama.cpp. It supports multiple quantization levels and is suitable for different hardware configurations and performance requirements.
mispeech
MiDashengLM-7B-0804 is a multimodal speech language model with 7B parameters released by Xiaomi, which supports audio understanding and text generation tasks and is suitable for inference and fine-tuning in general scenarios.
allura-forge
MiMo is a series of large language models trained from scratch by Xiaomi specifically for inference tasks. Through optimizing pre-training and post-training strategies, it demonstrates excellent performance in mathematical and code reasoning tasks. The project has open-sourced multiple versions with a 7B parameter scale, including the base model, SFT model, and RL model.
benxh
This is the GGUF quantized version of the XiaomiMiMo/MiMo-VL-7B-RL-2508 model, using the Q6_K quantization level. This model is a multimodal visual language model with a scale of 7B parameters, supporting joint understanding and generation tasks of images and text.
MiMo-VL is a compact and powerful vision-language model that combines a native resolution ViT encoder, an MLP projector, and the MiMo-7B language model. It performs excellently in tasks such as multimodal reasoning. This model shows outstanding performance in multiple benchmark tests, has a chain-of-thought control function, and significantly improves the user experience.
MiMo-VL is a compact and powerful vision-language model that combines a native resolution ViT encoder, an MLP projector, and the MiMo-7B language model optimized for complex reasoning tasks. Through multi-stage pre-training and post-training, it has achieved excellent results in multiple vision-language tasks.
MiMo is a series of 7B parameter models trained from scratch for inference tasks. Through optimized pre-training and post-training strategies, it performs excellently in mathematical and code reasoning tasks.
MiMo-7B is a language model series launched by Xiaomi, specifically designed for reasoning tasks, including base models, SFT models, and RL models, excelling in mathematical and code reasoning tasks.
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, demonstrating outstanding performance in mathematical and code reasoning tasks, comparable to OpenAI o1-mini.
A 7B-parameter specialized inference language model series launched by Xiaomi, significantly enhancing mathematical and code reasoning capabilities through optimized pre-training and post-training strategies
MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, achieving performance comparable to OpenAI o1-mini in mathematical and code reasoning tasks.
Tonic
The GemmaX2-28-2B GGUF quantized model is a collection of quantized versions of the GemmaX2-28-2B-v0.1 translation large language model developed by Xiaomi, supporting machine translation tasks in 28 languages.
Implementation of the Xiaomi Cloud Notes MCP Server, supporting complete management functions such as note reading, searching, creation, editing, and deletion, and providing the ability to convert between Markdown and XML formats and upload images