Open Source DeepSeek R1 Enhanced Version: 200% Improvement in Inference Efficiency, Lower Costs

AIbase基地

Published inAI News · 5 min read · Jul 4, 2025

74

Recently, the renowned German technology consulting company TNG released an enhanced version of DeepSeek — DeepSeek-TNG-R1T2-Chimera, marking another major breakthrough in the reasoning efficiency and performance of deep learning models. This new version not only improves reasoning efficiency by 200%, but also significantly reduces reasoning costs through the innovative AoE architecture.

Innovative AoE Architecture

The Chimera version is a hybrid development based on three models of DeepSeek: R1-0528, R1, and V3-0324, and it adopts a new AoE (Adaptive Expert) architecture. This architecture finely optimizes the mixture-of-experts (MoE) architecture, allowing efficient use of model parameters, thereby enhancing reasoning performance and saving token output.

In various mainstream benchmark tests (such as MTBench and AIME-2024), the Chimera version outperforms the standard R1 version, demonstrating significant reasoning capabilities and cost-effectiveness.

Advantages of MoE Architecture

Before understanding the AoE architecture, we need to understand the mixture-of-experts (MoE) architecture. The MoE architecture divides the feed-forward layer of Transformer into multiple "experts," and each input token is only routed to a subset of experts. This method effectively improves the efficiency and performance of the model.

For example, the Mixtral-8x7B model launched by Mistral in 2023, although it only activates 1.3 billion parameters, can rival the LLaMA-2-70B model with 70 billion parameters, achieving six times higher reasoning efficiency.

The AoE architecture utilizes the fine-grained characteristics of MoE, allowing researchers to build sub-models with specific capabilities from existing mixture-of-experts models. By interpolating and selectively merging the weight tensors of parent models, the new model retains excellent characteristics while flexibly adjusting its performance according to actual needs.

Researchers selected DeepSeek-V3-0324 and DeepSeek-R1 as parent models, and through different fine-tuning techniques, both models demonstrated excellent reasoning capabilities and instruction following.

Weight Merging and Optimization

During the process of building the new sub-model, researchers first need to prepare the weight tensors of the parent models and directly manipulate them by parsing the weight files. Then, by defining weight coefficients, researchers can smoothly interpolate and merge the features of the parent models, generating new model variants.

During the merging process, researchers introduced threshold control and difference filtering mechanisms to ensure that only tensors with significant differences are included in the merging scope, thus reducing model complexity and computational costs.

In the MoE architecture, the routing expert tensor is a crucial component, as it determines which expert module is selected for input tokens during reasoning. The AoE method pays special attention to the merging of these tensors. Researchers found that optimizing the routing expert tensor can significantly enhance the reasoning capability of the sub-model.

Finally, using the PyTorch framework, researchers implemented the model merging. The merged weights were saved to new weight files, generating a new sub-model that demonstrates high efficiency and flexibility.

Open source address: https://huggingface.co/tngtech/DeepSeek-TNG-R1T2-Chimera

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

Building and Deploying AI

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission