Artificial intelligence laboratory nof1, focused on financial market research, has announced the launch of the large model trading test project Alpha Arena, to examine the trading decision-making and risk control capabilities of different mainstream large models in real financial environments. The test was conducted on the decentralized exchange Hyperliquid, with all models running using the same prompt and unified data input, and each model was given $10,000 in real funds for independent trading.

Stock Trend Chart (2)

Six leading AI models participated in the test, namely GPT-5, Gemini2.5Pro, Grok-4, Claude Sonet4.5, DeepSeek V3.1, and Qwen3Max. At the end of the test period, the results showed that DeepSeek V3.1 and Grok-4 performed the best, both achieving returns exceeding 14% and ranking jointly in second place; while Gemini2.5Pro performed poorly, suffering a loss of as high as 4257%, becoming the most unexpected result of this test.

nof1 stated that the goal of Alpha Arena is not merely to compare the superiority of models, but to verify the strategy stability and risk response mechanisms of different architectures in highly volatile markets, providing technical and methodological references for future AI-based autonomous quantitative trading. The launch of this experiment also reflects that large models are rapidly expanding from text understanding and reasoning tasks to real financial decision-making and asset management scenarios.