Recently, Mozilla engineers shared a behind-the-scenes story in their blog about successfully identifying 271 security vulnerabilities in the Firefox browser by utilizing the advanced AI model Claude Mythos from Anthropic. According to previous reports, the Mozilla team discovered and fixed these vulnerabilities in version 150 of the Firefox browser using the Mythos Preview AI model.

Among these 271 vulnerabilities, 180 were rated as "high-risk," meaning users could be affected while browsing normally; additionally, there are 80 medium-risk and 11 low-risk vulnerabilities. In response to external doubts about AI finding bugs, Mozilla released 12 complete Bugzilla reports to prove that this is not just AI hype.
Mozilla engineers mentioned that to overcome the common "hallucination" phenomenon in AI code analysis, they developed a specialized Agent Harness (agent toolkit). Previously, AI analyzing code would generate a large number of seemingly reasonable but actually fictional reports, significantly increasing the cost of manual review. This time's success was due to the improvement of the model's own capabilities and the application of this customized tool.
This toolkit can issue specific instructions to the model, such as "Find a bug in this file," while providing tools for reading and writing files and evaluating test cases, and repeating the process until the task is completed. In practical operations, the toolkit points to specific source files, and Mythos generates test cases on its own, such as specific HTML code, and then tests them using existing fuzzing tools. If a memory crash is triggered, it can confirm the presence of a vulnerability. To further filter false positives, Mozilla also introduced a second large model to score the output of the first model, and only high-scoring reports are submitted to developers.
Brian Grinstead, a senior engineer at Mozilla, stated that after dual verification, the final generated vulnerability reports had almost no false positives, providing engineers with a clear confirmation signal: the problem does exist, the repair work has been completed, and the test cases, once added to the repository, will not reappear.
Key Points:
🌟 Among the 271 security vulnerabilities, 180 were rated as "high-risk" and may affect users during normal use.
🤖 Mozilla effectively discovered and fixed vulnerabilities by using an AI model and an agent toolkit.
🔍 Through dual verification, the final reports had almost no false positives, ensuring the accuracy of vulnerability fixes.




