In the current field of artificial intelligence, Yann LeCun's proposed JEPA (Joint Embedding Prediction Architecture) is redefining the way large language models (LLMs) are trained. This Nobel Prize winner is not criticizing existing LLMs, but rather taking matters into his own hands to improve them. Traditional LLM training methods mainly rely on reconstruction and generation within the input space, such as predicting the next word, a method that has been proven to have limitations in the visual domain.
LeCun and his team believe that advanced techniques from the computer vision (CV) field can be used to enhance the performance of language models. The core idea of JEPA is to efficiently learn about the world by predicting missing features in an abstract representation space. The Meta AI team has successfully applied JEPA in image and video processing, and now they hope to expand this concept to the field of language models.
To fill this gap, researchers Hai Huang, Yann LeCun, and Randall Balestriero jointly proposed LLM-JEPA. This new model treats text and code as different perspectives of the same concept, and for the first time, successfully applied JEPA's self-supervised learning architecture to LLMs. By combining the advantages of JEPA in learning in the embedding space, LLM-JEPA not only retains the strong generative capabilities of LLMs, but also achieves dual benefits in performance and robustness.
Experiments have shown that LLM-JEPA performs well on multiple mainstream models (such as Llama3, OpenELM, Gemma2, etc.) and diverse datasets (such as GSM8K, Spider, etc.), significantly surpassing traditional LLM training objectives. In addition, it shows strong robustness in preventing overfitting, providing a new direction for the future development of language models.
Although the current research mainly focuses on the fine-tuning phase, preliminary pre-training results show great potential. The team plans to further explore the application of LLM-JEPA in the pre-training process in future work, expecting to inject new energy into the performance improvement of language models.