DeepSeek Launches Engram Module: Implanting Conditional Memory Axes into Sparse Large Models, Efficiency Significantly Improved

AIbase基地

Published inAI News · 4 min read · Jan 15, 2026

Traditional Transformer models often seem "wasteful" when dealing with repetitive knowledge, as they need to recompute the same patterns each time, which not only consumes depth but also wastes computational resources. To break through this bottleneck, the DeepSeek research team recently introduced an innovative module called Engram, which introduces an efficient "conditional memory axis" for sparse large language models (LLMs).

Different from existing Mixture of Experts (MoE) models, Engram is not intended to replace it, but rather to complement it, modernizing the classic N-gram embedding technique into a scalable lookup repository with query complexity of $O(1)$. In simple terms, Engram acts like a "quick memory book" for the model, specifically storing common phrases, entities, and other static patterns, allowing the model's core network to focus on more complex reasoning and long-range interactions.

In practical applications, the DeepSeek team conducted pre-training tests on a dataset containing 262 billion tokens. The experimental results showed that by allocating about 20% to 25% of the sparse parameter budget to Engram memory, the model's validation loss was significantly optimized. In tests of the Engram-27B and Engram-40B models, even with the same activated parameters, the model outperformed pure MoE baseline models in multiple benchmark tests such as MMLU, GSM8K, knowledge base, reasoning, code, and mathematics.

Additionally, Engram performs well in long text processing. After expanding to a context window of 32,768 tokens, the Engram model demonstrated stronger accuracy in tasks such as multi-query "needle-in-a-haystack" (NIAH) and variable tracking. This design not only enhances the model's knowledge base but also effectively increases the model's effective depth by offloading static reconstruction tasks, making AI smarter and more efficient.

Key points:

🧠 Innovative Architecture: DeepSeek introduced the Engram module, which efficiently retrieves static knowledge through $O(1)$ hash lookup, allowing the model's core to focus more on logical reasoning.
📈 Performance Leap: With the same computing resources, the 27B and 40B models incorporating Engram outperformed traditional MoE architectures in key rankings such as MMLU, math, and code.
📑 Enhanced Long Text Processing: This technology significantly improves the model's recall capability in long context environments, performing well in tests of 32k length, and effectively reducing the layer-to-layer loss required for prediction.

Transformer Engram Sparse Large Language Models N-gram

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Microsoft releases the open-source AI model Phi-4-Reasoning-Vision-15B, which has high-resolution visual perception and deep reasoning capabilities. It is the first small language model that achieves both 'clear vision' and 'deep thinking,' opening up new intelligent application scenarios for developers.

Mar 5, 2026

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

Microsoft releases the open-source multimodal large model Phi-4-reasoning-vision-15B, which has 15 billion parameters. Its core breakthrough is the ability to autonomously assess task difficulty and intelligently choose between rapid response or in-depth reasoning, a rare feature in lightweight open-source models. The model specializes in high-difficulty tasks such as image description, interface element localization, and complex mathematical reasoning.

Mar 5, 2026

Lin Junyang, the core leader of QWEN, responds to his departure: says he really needs a break, and he has been deeply involved in large models for many years

Alibaba's Tongyi Qianwen model core leader Lin Junyang resigned on March 4, sparking industry attention. His departure may impact Alibaba's large model development.....

Mar 4, 2026

160

Official Social Accounts of OpenClaw Launch, Major Domestic Large Model Manufacturers Join the Interaction

On March 3rd, the official Weibo account of the open source project OpenClaw was launched, and its first post triggered interaction from multiple domestic large model manufacturers. The project has shown remarkable performance in the global AI field, is currently strong on GitHub's trending list, and has attracted attention during MWC2026.

Mar 4, 2026

200

StepZen Step3.5Flash Full-Stack Open Source: 196 Billion Parameters MoE Architecture, Inference Volume Ranks Second to OpenClaw

StepZen opens the full stack of the Step3.5Flash model, including pre-training, mid-training weights, and training framework. The model is designed for agents, using a sparse MoE architecture, with a total of 196 billion parameters, activating approximately 11 billion parameters during inference, high energy efficiency, and the highest code task inference speed per request reaches 350TP.

Mar 4, 2026

170

GPT-5.2 and Claude4 Simulate a Nuclear Crisis: Advanced Models Demonstrate Complex Reasoning and Deception Capabilities in Strategic Simulations

A King's College London study in Feb 2026 shows GPT-5.2 and two other LLMs simulated national leaders in a nuclear crisis, using a three-stage cognitive framework to make strategic decisions under seven pressure scenarios. The experiment, with over 300 rounds and 780,000 words of reasoning data, reveals AI's strategic behavior patterns under extreme uncertainty.....

Mar 4, 2026

100

Qwen's Key Figure Leaves? Alibaba Tongyi Qianwen Technical Director Lin Junyang Announces Resignation

Alibaba's Tongyi Qianwen model tech lead Lin Junyang steps down. The 1993-born expert, with a Peking University background in computer science and linguistics, enhanced the model's semantic understanding and long-text processing.....

Mar 4, 2026

200

World's First! Rokid AI Glasses Overseas Version Makes a Splash Midnight: Integrates Four Top AI Models - Gemini/ChatGPT/DeepSeek/Tongyi Qianwen

The Chinese AI hardware brand Rokid recently announced that its overseas version of the AI glasses, Rokid Glasses, has completed a software upgrade, becoming the world's first AI glasses natively supporting Google Gemini. After the upgrade, the product integrates four top AI models, creating an 'all-in-one' entry point, and adopts an open ecosystem strategy, differentiating itself from brands like Meta.

Mar 3, 2026

250

Zuckerberg's Super Intelligence Enters E-commerce! Meta AI Tests a Shopping Assistant: Relying on 3 Billion Social Profiles for Precise Recommendations That Understand You Better

Meta is secretly testing an AI shopping feature that leverages social data to challenge OpenAI and Google, offering personalized recommendations based on user profiles to transform the AI shopping experience, marking a shift in large model competition from content creation to e-commerce.....

Mar 3, 2026

190

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

As a pre-release version of V4, DeepSeek V4 Lite has attracted attention with 200 billion parameters and a context length of up to 1 million tokens. After continuous upgrades, its performance is comparable to top closed-source models, showing outstanding results in various benchmark tests and demonstrating strong competitiveness.

Mar 3, 2026

760

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

DeepSeek Launches Engram Module: Implanting Conditional Memory Axes into Sparse Large Models, Efficiency Significantly Improved

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

Lin Junyang, the core leader of QWEN, responds to his departure: says he really needs a break, and he has been deeply involved in large models for many years

Official Social Accounts of OpenClaw Launch, Major Domestic Large Model Manufacturers Join the Interaction

StepZen Step3.5Flash Full-Stack Open Source: 196 Billion Parameters MoE Architecture, Inference Volume Ranks Second to OpenClaw

GPT-5.2 and Claude4 Simulate a Nuclear Crisis: Advanced Models Demonstrate Complex Reasoning and Deception Capabilities in Strategic Simulations

Qwen's Key Figure Leaves? Alibaba Tongyi Qianwen Technical Director Lin Junyang Announces Resignation

World's First! Rokid AI Glasses Overseas Version Makes a Splash Midnight: Integrates Four Top AI Models - Gemini/ChatGPT/DeepSeek/Tongyi Qianwen

Zuckerberg's Super Intelligence Enters E-commerce! Meta AI Tests a Shopping Assistant: Relying on 3 Billion Social Profiles for Precise Recommendations That Understand You Better

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

AI News Recommendations

Microsoft Launches the Small Multimodal AI Model Phi-4: The Perfect Combination of Thinking and Perception!

Capable of Deciding When to Think on Its Own! Microsoft Releases Phi-4 15B Open-Source Model, Focused on Miniaturization and Multimodal Capabilities

Lin Junyang, the core leader of QWEN, responds to his departure: says he really needs a break, and he has been deeply involved in large models for many years

Official Social Accounts of OpenClaw Launch, Major Domestic Large Model Manufacturers Join the Interaction

StepZen Step3.5Flash Full-Stack Open Source: 196 Billion Parameters MoE Architecture, Inference Volume Ranks Second to OpenClaw

GPT-5.2 and Claude4 Simulate a Nuclear Crisis: Advanced Models Demonstrate Complex Reasoning and Deception Capabilities in Strategic Simulations

Qwen's Key Figure Leaves? Alibaba Tongyi Qianwen Technical Director Lin Junyang Announces Resignation

World's First! Rokid AI Glasses Overseas Version Makes a Splash Midnight: Integrates Four Top AI Models - Gemini/ChatGPT/DeepSeek/Tongyi Qianwen

Zuckerberg's Super Intelligence Enters E-commerce! Meta AI Tests a Shopping Assistant: Relying on 3 Billion Social Profiles for Precise Recommendations That Understand You Better

DeepSeek V4 Lite Evolves Stealthily: A 200 Billion-Parameter Small Model with Impressive Performance, Approaching Top Overseas Models

GEO Services