Recently, the team of Li Guoqi and Xu Bo from the Institute of Automation, Chinese Academy of Sciences, jointly released the world's first large-scale brain-inspired spiking model - SpikingBrain 1.0. The model demonstrates astonishing speed in processing long texts, capable of processing ultra-long texts of 4 million tokens more than 100 times faster than current mainstream Transformer models, and only requires 2% of the data.

image.png

The mainstream large language models, such as the GPT series, are generally based on the Transformer architecture. Although Transformer is well-known for its powerful self-attention mechanism, its computational complexity is a fatal weakness. When the length of the text increases, the computational load grows quadratically, making the processing of long texts extremely time-consuming and energy-intensive. This phenomenon makes AI seem powerless when analyzing long novels or legal documents.

To seek new solutions, the research team turned their attention to the most efficient intelligent system in nature - the human brain. The human brain consists of billions of neurons, yet it consumes only 20 watts. The team proposed the concept of "intrinsic complexity", aiming to improve the efficiency and intelligence of internal units of the model.

The SpikingBrain model simulates the working method of human brain neurons through a new architecture, divided into two versions: SpikingBrain-7B (7 billion parameters) and SpikingBrain-76B (76 billion parameters). First, the model discards the quadratic complexity self-attention mechanism of traditional Transformers and adopts a "hybrid linear attention architecture", reducing the computational complexity to linear (O(n)), significantly improving the efficiency of processing long texts.

Secondly, SpikingBrain introduces "adaptive threshold spiking neurons", where the activation of neurons depends on the strength of the received signal. By dynamically adjusting the threshold, the model ensures that neurons work at high efficiency. This event-driven mechanism significantly saves energy consumption, with a high computational sparsity of 69.15%.

In addition, the team has developed an efficient model conversion technology that can directly convert existing Transformer models into the SpikingBrain architecture, reducing training costs. All technical details and code have been open-sourced on GitHub and ModelScope platforms for global researchers to use.

The release of SpikingBrain not only achieved a major breakthrough in computational efficiency, but also provided a new approach for future general artificial intelligence.

GitHub:

https://github.com/BICLab/SpikingBrain-7B

Key Points:

🌟 The SpikingBrain model launched by the research team is 100 times faster in processing long texts than mainstream models and requires only 2% training data.

🧠 The model uses a hybrid linear attention architecture, reducing computational complexity from quadratic to linear, thus improving processing efficiency.

💡 The adaptive threshold spiking neuron mechanism of SpikingBrain significantly reduces energy consumption and achieves high computational sparsity.