Tencent recently released the large model training library WeChat-YATT (Yet Another Transformer Trainer), developed based on Megatron-Core and SGLang/vLLM, with an internal project code name of gCore. This training library focuses on reinforcement learning and multi-modal model training, aiming to provide developers with an expandable, simple, efficient, and reliable large model training solution.
WeChat-YATT can effectively handle complex scenarios such as large-scale models, long sequence inputs, and large datasets through customized parallel computing strategies, successfully solving key pain points in multiple practical business scenarios within WeChat, significantly improving the efficiency of large model training. The tool provides researchers and developers with a flexible and scalable technical solution, which is expected to drive innovation and development in the fields of multi-modal and reinforcement learning.
WeChat-YATT focuses on solving two major core technical bottlenecks encountered during the distributed training of large models.
The first is the scalability bottleneck in multi-modal scenarios. As the scale of multi-modal data such as images and videos continues to grow, traditional architectures that rely on a single controller for data management tend to become communication and memory bottlenecks, limiting system throughput and even causing training processes to fail unexpectedly. WeChat-YATT addresses this by introducing a parallel management mechanism with parallel controllers, effectively distributing system pressure and significantly enhancing system scalability and stability, better handling complex application scenarios involving multi-modal and large data volumes.
The second is the efficiency gap under dynamic sampling and generative reward calculation. In training workflows that require frequent dynamic sampling or generative reward calculations, frequent model switching and "long-tail" tasks generate significant additional overhead, leading to underutilization of GPU computing power and severely affecting overall training efficiency. WeChat-YATT alleviates model switching costs and the impact of long-tail tasks through partial coexistence strategies and asynchronous interaction mechanisms, achieving high throughput and high resource utilization during the training process, thus better supporting efficient iteration of large-scale RLHF tasks.
Depending on different business scenario requirements, WeChat-YATT supports two resource placement modes: full coexistence and partial coexistence, to maximize cluster resource utilization.
In the full coexistence mode, a serial scheduling mechanism is used, where Actor Rollouts, GenRM (Generative Reward Model), and Train are executed sequentially. After completing their tasks, each role actively releases computing resources, and the system then loads the next task's required model. This strategy is suitable for most conventional training scenarios. Notably, during each phase, the relevant components can exclusively use all GPU resources, greatly reducing the "bubble" time of idle resources and significantly improving the overall training throughput and efficiency.
In the partial coexistence mode, Actor Rollouts and GenRM are deployed independently and interact efficiently through asynchronous methods. During the Actor training phase, all GPU resources are occupied, and during the Rollouts generation phase, GPU resources are released, and the Actor Rollouts and GenRM components work together. The system dynamically evaluates load for resource allocation and balancing. Once the Rollouts are generated, these components release resources, and the Actor reloads onto the GPU for the next training cycle. The partial coexistence mode is particularly suitable for complex tasks requiring frequent interactions and dynamic sampling between Rollouts and GenRM.
WeChat-YATT also features several technical strengths. In terms of memory utilization, the project adopts a parallel controller architecture, effectively reducing memory consumption per node, making it more suitable for large model training in multi-modal scenarios and enhancing system scalability and stability. Regarding GenRM support, different resource placement strategies have been implemented for generative reward model scenarios, allowing users to choose the optimal training solution based on specific scenarios.
The intelligent checkpoint strategy is another highlight. WeChat-YATT supports asynchronous checkpoint saving and automatically saves checkpoints according to the scheduling process based on WeChat's business characteristics, further ensuring the safety and high availability of training. Additionally, the system achieves load balancing among various data parallel groups during training, effectively reducing idle resource time and significantly improving overall training throughput.
The release of this training library marks an important progress in Tencent's large model technology infrastructure construction and also provides an effective solution for handling complex multi-modal training scenarios in the industry.