Meta has recently released its largest-ever AI-driven chemistry open dataset, OMol25, and simultaneously introduced the Universal Atom Model (UMA), a general-purpose AI model for predicting molecular and material chemical properties. These two innovations aim to accelerate key areas such as drug discovery, battery material development, and catalyst research.

According to Meta, the OMol25 dataset contains more than 100 million high-precision molecular calculation data points, far exceeding any other publicly available dataset of its kind. To generate this vast resource, Meta utilized over 6 billion hours of computational time. The dataset covers a wide range of molecular types, including small organic compounds, biomolecules (such as proteins and DNA fragments), metal complexes, and electrolytes. Additionally, it includes information on charged states, spin states, various spatial arrangements (conformations), and chemical reaction-related data, providing detailed chemical property data such as energy, force values, charge distribution, and orbitals. Currently, the OMol25 dataset is available on the Hugging Face platform.

Chip Technology (1)

The UMA model, which was released alongside OMol25, is a new AI model trained by Meta based on OMol25 and other datasets. The unique aspect of UMA lies in its ability to predict chemical properties at the atomic level and its significantly faster speed compared to traditional computational methods. Unlike previous approaches that required building specialized models for specific tasks, UMA is generalizable and can handle a variety of applications from molecular simulations (for drug discovery) to materials and catalysis research. Built on advanced graph neural networks with a "mixed linear expert" architecture, UMA achieves a good balance between computational speed and prediction accuracy. In benchmark tests, UMA's performance matches that previously achievable only by finely-tuned specialized models.

Meta emphasized that using UMA, molecular simulations and calculations that used to take days can now be completed in just seconds. This allows researchers to quickly screen thousands of potential new molecules before laboratory synthesis, thus efficiently evaluating their potential as drugs or battery materials. The UMA model is also available for access on Hugging Face.

It is worth noting that Meta has also introduced a new AI molecular simulation method called "accompanying sampling." Unlike traditional AI models that typically require large amounts of real-world data to generate new molecular structures, "accompanying sampling" can learn and propose new molecular structures even in the absence of real samples. This technique draws on concepts from stochastic control theory and diffusion processes, with Meta’s team believing that diffusion processes are particularly suitable for simulating molecules. Experiments show that "accompanying sampling" can quickly explore multiple variants of molecular structures with minimal computation, and the generated molecular conformations not only match the results of traditional software but even outperform them when dealing with molecules with multiple flexible components. Related models, code, and additional information are available on Hugging Face and GitHub.

Despite significant progress, Meta pointed out that there are still some challenges. For example, coverage of certain chemical domains, such as polymers, some metals, or complex protonated states, is still incomplete. Additionally, there is room for improvement in the AI model's ability to predict charges, spins, and long-range interactions.