Artificial intelligence has recently witnessed an intriguing new development. Tokyo-based Sakana AI published a paper titled "Continuous Thought Machines" which proposes a novel model aimed at enabling machines to simulate the complex neural activities of biological brains and their ability for "continuous thought." The core idea of this paper challenges the simplified handling of time dynamics in current deep learning, attempting to reintroduce temporal processing and synchronization mechanisms at the neuron level, making "neural timing" the foundation of artificial intelligence models.
Although mainstream neural networks are inspired by biological brains, they differ significantly from biological brains in handling temporal information. Biological brain neural activities exhibit high complexity and dynamism over the time dimension, which is crucial for information processing and cognitive functions. However, many modern neural networks simplify this temporal dynamic for computational efficiency, reducing neuron activation to static output. This simplification has achieved success in specific tasks but also limits the performance of artificial intelligence in areas such as common sense reasoning and flexible adaptability.
Researchers at Sakana AI believe that the time dimension is critical for achieving higher-level artificial intelligence. Their proposed "Continuous Thought Machine" (CTM) model is based on this concept, aiming to make dynamic characteristics at the neuron level its core representation.
CTM's Two "Game-Changers": Bringing Neurons to Life
So, how does CTM achieve this grand goal? The paper mentions two core innovations:
Neuron-level temporal processing: This means each neuron is equipped with unique weight parameters, allowing it to process signal history received over a period of time, not just immediate input. In traditional models, neurons typically respond immediately to current inputs, while CTM neurons can consider historical information to calculate their activation status. This mechanism makes the activation patterns of neurons more complex and diverse, closer to the actual working method of biological neurons.
Neural synchronization as a latent representation: This is another key innovation of CTM. CTM no longer relies solely on snapshots of neuron activations at specific times but uses the "synchronicity" of neuron activity over a period as the core internal representation. That is, CTM focuses on the coordination and cooperation of different neuron activity patterns over time. This synchronization information is used to understand input data, make predictions, and adjust the model's attention mechanism.
CTM's "Inner Monologue": A Decoupled "Thinking Dimension"
To enable this time-based "thinking," CTM introduces a very important concept—the internal sequence dimension, referred to by researchers as the "internal ticks." This dimension is independent of the dimensions of the input data, allowing the model to iteratively process and extract information internally according to its own rhythm, whether the input is a static image or a complex maze.
This internal "thinking" process can be simplified as follows:
Information exchange (Synapse Model): The synapse model is responsible for information transfer between neurons. It receives the "post-activation state" of neurons from the previous moment and features extracted from external input data through attention mechanisms, calculating the "pre-activation state" for the current moment.
Neuron-level personalized processing: Each neuron has an independent neuron-level model, which calculates the "post-activation state" for the next moment based on its received "pre-activation state" history records.
"Synchronous" mind reading (Neural Synchronization): CTM records the "post-activation state" history of all neurons over a period of time and calculates their "synchronization matrix." This matrix reflects the correlation of different neuron activity patterns.
Decision and action (Output and Attention): Based on this "synchronization matrix," CTM generates outputs (such as image classification results) or adjusts attention to input data (such as deciding to focus on specific regions of an image).
Cyclical repetition, continuous "thinking": The attention mechanism's output and the current neuron's "post-activation state" together enter the next "internal tick" cycle, repeating until the model completes its processing.
CTM's "Superpowers" Demonstrated: From Image Recognition to Maze Solving, All Kinds of Tasks!
Enough theory, how does CTM perform in practice? In the paper, researchers put CTM through a series of challenging tasks, and the results were quite promising:
ImageNet-1K image classification: Although the paper claims its goal is not to break records, CTM achieved robust performance in this classic image classification task. More importantly, it demonstrated an interesting internal "thinking" process. For example, CTM's attention moves smoothly across different regions of the image during "internal ticks," sometimes focusing on key features and other times covering broader areas, as if carefully observing and understanding the content of the image.
In addition, CTM showed good calibration, meaning its confidence in its predictions was reliable, which usually requires additional training techniques to achieve. An interesting discovery was that CTM's neuron activity exhibited complex multi-scale patterns, even without external drive signals, low-frequency traveling waves commonly observed in biological cortical areas could be seen.
2D maze challenge: To test CTM's complex sequential reasoning and planning capabilities, researchers designed a challenging 2D maze task. This task required the model to directly output the complete path from start to finish, and position encoding was removed from the attention mechanism, prompting the model to build an internal "world representation" of the maze.
The results showed that CTM performed excellently in this task, significantly surpassing baseline models like LSTM, demonstrating its ability to build and utilize internal world models. More interestingly, even when faced with larger mazes and longer paths than those during training, CTM could solve problems by "reapplying" (using the endpoint of the previous prediction as the starting point of the next), showing some generalization ability. Researchers believe that this ability is similar to human "episodic future thinking," where imagining future states guides current actions.
Sorting, parity check, Q&A MNIST: CTM also performed well in tasks requiring understanding of algorithmic processes, memory, and logical operations. For example, in sorting tasks, the "waiting time" (i.e., the number of "internal ticks" required to produce the output) for each number in the output sequence was related to the difference between numbers, suggesting that it internally formed an algorithm dependent on the arrangement of data.
In parity check tasks, CTM learned to calculate cumulative parity based on the input sequence step by step, and CTM with more "thinking time" (number of internal ticks) performed better, even developing different strategies such as forward or reverse processing of sequences. In Q&A MNIST tasks, CTM first observed a series of MNIST digit images, then recalled previously seen digits based on subsequent index and operator instructions, and performed modulo operations. Even when observed digits exceeded the direct "memory window" of the neuron model, CTM could recall these digits through the organization and synchronization of neurons, demonstrating its potential for memory and retrieval through neural synchronization.
Reinforcement learning tasks: CTM can also be applied to reinforcement learning tasks requiring continuous interaction with the external environment. In classic CartPole (balance rod), Acrobot (double pendulum), and MiniGrid Four Rooms (four-room navigation) partially observable environments, CTM learned effective strategies, performing comparable to LSTM baselines, but its internal neuron activity patterns were richer and more complex. This indicates that CTM can indeed use neural dynamics as a continuous computing tool, continuously adjusting and learning during interactions with the environment.
CTM's "Weak Points" and Future Prospects: The Path Is Long, But Progress Will Be Made
Of course, CTM is not without room for improvement. The paper also points out some of its current limitations:
Computational cost: Due to its sequential processing nature, CTM's training time is longer than standard feedforward models, and the neuron-level model also brings additional parameter overhead. Researchers believe that its benefits are worth further exploration.
"Black box" challenge: Although CTM's internal process provides some clues for interpretability, fully understanding how its complex neural dynamics generate intelligent behavior still requires further research.
Despite this, CTM's proposal brought a new perspective to the field of artificial intelligence. It challenged existing model paradigms, emphasizing the potential value of "neural timing" and "neural synchronization" in building AI systems closer to biological intelligence. Researchers also looked ahead to the future development directions of CTM:
Exploring larger-scale and more complex synchronization representations: Currently, CTM mainly utilizes partial neuron pair synchronization information. Future research can explore the potential of using full, high-dimensional synchronization matrices, which may have advantages in multimodal modeling.
Application to sequence data and language modeling: CTM's "continuous thinking" capability makes it potentially capable of processing sequence data such as videos and texts, and even constructing a contextual "world model" for language without position encoding.
Moving towards a more "natural" training approach: Current CTM evaluations are still conducted within traditional datasets and training frameworks. Future research can explore training methods closer to real-world data generation scenarios, such as data arranged in chronological order.
Borrowing more biological mechanisms: For example, exploring the combination of biological plasticity mechanisms (such as Hebbian learning) with CTM for application in cutting-edge research areas like lifelong learning or gradient-free optimization.
The Road of AI's "Thought" Continues to Explore
In summary, the "Continuous Thought Machine" (CTM) proposed by Sakana AI is an innovative and thought-provoking research work. It encourages us to revisit the simplifications of time dynamics in current deep learning models and draw inspiration from biological neural computation to explore new paths for building stronger and more flexible AI systems. Although the goal of truly achieving human-like "thinking" for AI systems remains a long way off, the advent of CTM provides new ideas and tools for this direction of research.
This study also reminds us again that drawing on principles of biological intelligence may be a promising path in the development of artificial intelligence. Some emergent properties of CTM, such as good calibration, were not pre-designed but naturally arose in simulating biological mechanisms, which is itself very intriguing. In the future, how to strike a better balance between computational efficiency and biological plausibility, and how to integrate more essence of biological intelligence into AI models, will be important topics worthy of continued exploration.
Paper link: https://arxiv.org/abs/2505.05522
Project link: https://github.com/SakanaAI/continuous-thought-machines/