Microsoft's CEO Satya Nadella recently announced on a social platform that Microsoft has officially launched the breakthrough medical AI system MAI-DxO. This innovative system stands out with its unique **"model-agnostic" design**, which can flexibly adapt to different language models from various manufacturers and capabilities, thereby generally improving their diagnostic performance. More excitingly, MAI-DxO not only simulates the diagnostic process of real doctors but also demonstrated significantly higher diagnostic accuracy than professional doctors in tests, while greatly reducing the cost of medical diagnosis.

According to Microsoft's test data, in a comparison with 56 hidden test cases from the New England Journal of Medicine, the average diagnostic accuracy of 21 professional doctors with more than ten years of experience was only 19.9%. However, when using the OpenAI o3 model without budget allocation, MAI-DxO achieved an accuracy of 81.9%; in integrated mode, it even reached an astonishing 85.5%, which is more than **four times** the accuracy of professional doctors.

The core innovation of MAI-DxO lies in its simulating the collaboration model of real medical teams, where a group of virtual doctors with different roles work together to solve diagnostic problems, achieving significant breakthroughs in diagnostic accuracy and cost-effectiveness. This virtual doctor team includes Dr. Hypothesis, who maintains and updates the differential diagnosis list; Dr. Test-Chooser, who selects the most discriminative tests each round; Dr. Challenger, who acts as a supervisor, identifies biases, and proposes challenging suggestions; Dr. Stewardship, who advocates for cost awareness and optimizes examination plans; and Dr. Checklist, who is responsible for behind-the-scenes quality control and ensures reasoning consistency.

To adapt to different medical scenarios requiring cost, efficiency, and accuracy, MAI-DxO provides five integrated modes. These modes range from the Instant Answer mode, which relies only on initial case summaries for rapid preliminary diagnosis, suitable for emergency or resource-limited situations; to the Question Only mode, which diagnoses by questioning and simulates primary care; the Budgeted mode, which introduces a dynamic budget control mechanism; the No Budget mode, which aims to maximize diagnostic accuracy and handles complex and difficult cases; and the Ensemble mode, which simulates multiple doctor teams working in parallel to further improve diagnostic accuracy.

Alongside the launch of MAI-DxO, Microsoft also introduced a professional medical sequential diagnostic benchmark SDBench. This interactive evaluation framework transforms 304 challenging diagnostic cases from the New England Journal of Medicine into step-by-step diagnostic scenarios, providing realistic materials to evaluate the sequential diagnostic capabilities of both human doctors and AI. In SDBench, the "gatekeeper" agent simulates the information acquisition process, while the "judge" agent conducts a multidimensional comprehensive assessment of the diagnostic results based on clinical substance, while also incorporating cost into the evaluation metrics, setting a new industry standard for medical AI diagnosis.