Tencent Proposes a Training-Free Optimization Method: Achieving the Effect of Traditional 70,000 Yuan Fine-tuning with Only 120 Yuan Cost
Tencent released the Training-Free GRPO technology, which replaces parameter fine-tuning with an external knowledge base, achieving performance optimization under the condition of frozen model parameters. This method transforms empirical knowledge into token-level prior information, significantly reducing training costs, and achieves performance improvements comparable to expensive fine-tuning on the DeepSeek-V3.1-Terminus model.