The application of artificial intelligence in the field of scientific research is entering a new turning point. On June 18, Alibaba ATH-Token Foundry and the Gaojie Institute of Artificial Intelligence at Renmin University officially announced the open-sourcing of a multi-domain scientific generative foundation model called
For a long time, there have been deep "language barriers" between different branches of science. Proteins, small molecules, and complex materials are often data islands with very different structures and difficult to be compatible in the eyes of AI. To enable these scientific objects to "communicate," previous research often relied on complex 3D coordinates or specially designed geometric neural networks, which not only had high computational costs but also had extremely poor model versatility, requiring a complete restart for each new research phase.

LOGOS' core innovation lies in breaking down this barrier. It designs a shared vocabulary that encodes heterogeneous objects such as proteins, antibodies, small molecules, and MOF materials through a unified discrete token sequence. This means the model no longer relies on expensive 3D spatial information but directly builds complex 3D interaction rules by using a sequence prediction approach similar to reading text. The establishment of this "scientific grammar" enables knowledge sharing at the underlying level among data from different disciplines.

In terms of parameter efficiency, LOGOS shows astonishing performance. The LOGOS-1B version surpasses Microsoft's NatureLM in multiple representative scientific tasks with only 1/56th of the parameters. In addition, LOGOS completely solves the "objective discrepancy" problem between pre-training and downstream tasks, allowing the model to directly activate generation capabilities without cumbersome fine-tuning, greatly reducing the development threshold for researchers.
Currently, LOGOS has built a large-scale pre-training corpus containing 7 modalities and a total of 44.87 billion tokens. The project team has fully open-sourced related model weights, inference code, and detailed technical reports. Developers can access and obtain them through
This breakthrough not only provides a powerful engine for scientific automation but also sets a new technical paradigm for the development of future multimodal scientific large models. With the open-sourcing of LOGOS, the "language" of the scientific community may become more unified and efficient than ever before.



