Today, with the rapid development of artificial intelligence, the financial industry has higher requirements for the accuracy and security of data. Tencent recently announced the open-source release of a evaluation tool called finLLM-Eval, specifically designed for the application of large models in financial scenarios. This tool introduces, for the first time in the industry, a financial data accuracy evaluation method without GroundTruth, filling the gap in the current market for evaluation of large models in the financial field. It aims to promote the safe deployment of AI technology in high-risk, high-demand financial areas.

The core capabilities of finLLM-Eval include multiple modules, particularly highlighting the logic consistency and factual accuracy evaluation module. This module not only provides complete engineering code and example evaluation sets, but also supports users to create their own evaluation sets, automatically outputting detailed information about model performance. Users will receive a complete evaluation report, including total score, distribution of errors, and hallucination rate per thousand characters, helping them gain a deep understanding of the model's actual performance.

In addition, finLLM-Eval also has an end-to-end financial data accuracy comparison module. The biggest highlight of this technical solution is that it can automatically extract the three elements of financial facts—"subject × time × indicator" based on real user questions without GroundTruth, and verify them through an internal financial database, eliminating the need for manual annotation.

More intelligently, finLLM-Eval introduces the AgentAsJudger automated evaluation mechanism. The entire evaluation process requires no human intervention, as the AI agent can automatically extract factual points and logical chains, compare them with relevant content or the financial database, achieving an accuracy rate of over 96%. This innovation not only improves evaluation efficiency but also ensures the reliability of evaluation results.

Looking ahead, the project team plans to continuously iterate on finLLM-Eval. In the future, it will support data verification and result attribution for non-financial indicators, helping to further develop and improve financial technology.