Baichuan Intelligence and the Tsinghua team released the medical large model Baichuan-M4, securing first place in three sub-leaderboards in the authoritative HealthBench evaluation, outperforming GPT-5.5. Its core breakthrough lies in completely transforming the interaction mode, achieving more clinically realistic intelligent diagnostic capabilities.
Cao Cao Mobility launched autonomous taxi services in Hong Kong at the Auto Show, unveiling RoboX strategy and full AI pivot to build a global leading physical AI mobility platform. Hong Kong is the first benchmark city for an international intelligent transport system. Eva Cab, China's first native robotaxi, debuted, marking RoboX implementation.....
The South Korean Ministry of Science and ICT signed a memorandum of understanding with OpenAI, becoming the fourth country to establish AI safety cooperation with it. The two parties will work together with the South Korea Artificial Intelligence Security Institute to jointly build a scientific and standardized global artificial intelligence security evaluation framework.
On June 17, Zhipu AI open-sourced GLM-5.2, a large model focused on code generation and long-horizon tasks. It ranked 2nd globally and 1st among open-source models in Code Arena's front-end evaluation. Since early 2025, Zhipu has advanced its code foundation with GLM-4.5/4.7, and GLM-5.2 extends to complex engineering tasks over days to months.....
A student-driven platform that provides professor ratings and course evaluations to help select high-quality courses.
A professional AI design platform for architecture, interior, and home design, offering multiple design tools.
An AI-driven FAANG-style mock coding interview platform that evaluates communication, code quality, etc.
Respan is an engineering platform for unified observability, evaluation, prompt optimization, and LLM gateways.
01-ai
-
Input tokens/M
Output tokens/M
200
Context Length
Baichuan
4
jayhuang92
Qwen-Image is a text-to-image generation model developed based on the Qwen series. It supports both Chinese and English inputs and performs excellently on multiple evaluation metrics. It is particularly suitable for image generation scenarios that pursue realistic effects.
Shawon16
This is a video understanding model based on the VideoMAE architecture. It was pre-trained on the Kinetics dataset and fine-tuned on an unknown dataset that may be related to sign language recognition. The model achieved an accuracy of 78.11% on the evaluation set and is suitable for video classification tasks.
drbaph
Z-Image is an efficient basic image generation model with 6 billion parameters, specifically designed to address the efficiency and quality issues in the field of image generation. Its distilled version, Z-Image-Turbo, can reach or exceed the leading competitors with only 8 function evaluations. It can achieve sub-second inference latency on enterprise-level H800 GPUs and run on consumer-grade devices with 16G VRAM.
This is a video understanding model based on the VideoMAE architecture. It has been fine-tuned on the basis of pre-training on the Kinetics dataset and is specifically designed for sign language recognition tasks. The model's performance on the evaluation set needs improvement, with an accuracy of 0.0010.
This is a video understanding model based on the VideoMAE-base architecture, which has been fine-tuned for 20 epochs on an unknown dataset. The model's performance on the evaluation set is limited, with an accuracy of 0.0041 and a loss value of 7.7839.
KonradBRG
This model is a joke rating model fine-tuned on multilingual text based on FacebookAI/xlm-roberta-large. It is specifically designed to evaluate the quality and humor level of jokes. It achieved an accuracy of 0.4005 and a root mean square error of 5.0327 on the evaluation set.
This is a video understanding model fine-tuned on an unknown dataset based on the MCG-NJU/videomae-base model. After 20 epochs of training, it achieved an accuracy of 13.31% on the evaluation set. This model is specifically optimized for video analysis tasks.
advy
This model is a large language model fine-tuned on a specific dataset based on meta-llama/Llama-3.1-70B-Instruct. It is specifically designed for text generation tasks and achieved a loss value of 0.6542 on the evaluation set.
Foshie
This is an English-Spanish translation model fine-tuned on the Amazon dataset based on the Google mT5-small model, specifically designed for text abstract generation tasks. The model achieved scores of Rouge1: 16.44 and Rouge2: 8.04 on the evaluation set.
Maxlegrec
The BT4 model is the neural network model behind the LeelaChessZero engine, specifically designed for chess games. This model is based on the Transformer architecture and can predict the best next move based on historical moves, evaluate the game situation, and generate move probabilities.
This is a video action recognition model fine-tuned on the WLASL dataset based on the VideoMAE-base architecture, specifically optimized for sign language recognition tasks, achieving an accuracy of 48.22% on the evaluation set.
This is a video action recognition model fine-tuned on the WLASL dataset based on the VideoMAE-Base architecture. After 200 epochs of training, it achieved a top-1 accuracy of 52.96% and a top-5 accuracy of 79.88% on the evaluation set, specifically designed for sign language action recognition tasks.
yueqis
This model is a professional code generation model fine-tuned on the swe_only_sweagent dataset based on Qwen2.5-Coder-32B-Instruct. It achieved a loss value of 0.1210 on the evaluation set and is specifically optimized for software engineering-related tasks.
EpistemeAI
metatune-gpt20b is a large language model prototype with self-improvement ability. It can generate new data for itself, evaluate its own performance, and adjust hyperparameters according to improvement indicators. This model performs excellently in scientific and mathematical understanding at the postdoctoral level and can also be used for coding tasks.
RedHatAI
This is a quantized version of unsloth/Mistral-Small-3.2-24B-Instruct-2506. By quantizing the weights and activation functions to the FP4 data type, it reduces the disk size and GPU memory requirements while supporting vLLM inference. It has been evaluated on multiple tasks to compare the quality with the non-quantized model.
qthuan2604
This is a Vietnamese text correction model fine-tuned based on vinai/bartpho-syllable, which has achieved a good result of 69.12% character accuracy on the evaluation set and is specifically used for the automatic correction task of Vietnamese texts.
ivan-kleshnin
This is a classifier model fine-tuned based on the jhu-clsp/mmBERT-small model, achieving an accuracy of 91.07% on the evaluation set. It is mainly used for text classification tasks.
yujieouo
G²RPO is a novel reinforcement learning framework specifically designed for preference alignment in flow models, significantly improving the generation quality through a granular reward evaluation mechanism.
Qwen
Qwen3-4B-SafeRL is a safety-aligned version based on the Qwen3-4B model. It is trained through reinforcement learning and combined with the reward signals of Qwen3Guard-Gen, enhancing the model's robustness against harmful or adversarial prompts. While ensuring safety, it avoids overly simple or evasive rejection behaviors.
This is a text classification model fine-tuned based on the mmBERT-small architecture, specifically designed for the message type classification task. It achieved an accuracy of 93.94% on the evaluation set and has efficient text classification capabilities.
Opik is an open-source LLM evaluation framework that supports tracking, evaluating, and monitoring LLM applications, helping developers build more efficient and cost-effective LLM systems.
The Node.js Debugger MCP server provides complete debugging capabilities based on the Chrome DevTools protocol, including breakpoint setting, stepping execution, variable inspection, and expression evaluation.
MCPBench is a framework for evaluating the performance of MCP servers. It supports the evaluation of two types of tasks: web search and database query, is compatible with local and remote servers, and mainly evaluates accuracy, latency, and token consumption.
mcp-chat is an open-source and general-purpose MCP client tool for testing and evaluating MCP servers and proxies. It supports command-line interaction and web mode, can connect to various MCP servers (JS/Python/Docker), and provides functions such as chat history recording, model selection, and system prompt customization to help developers debug MCP services.
This project is a Model Context Protocol (MCP) adapter used to connect large language models (LLMs) with the Lisp development environment, supporting interaction through the lightweight Lisply protocol. The main functions include Lisp code evaluation, HTTP requests, and debugging support, suitable for scenarios such as AI-assisted symbolic programming and CAD design automation.
The Linear Regression MCP project demonstrates an end-to-end machine learning workflow using Claude and the Model Context Protocol (MCP), including data preprocessing, model training, and evaluation.
An AI mentor server based on the Model Context Protocol, providing second - opinion services such as code review, design evaluation, writing feedback, and creative brainstorming through Deepseek - Reasoning
NPM Sentinel MCP is an AI-based NPM package analysis server that provides real-time security scanning, dependency analysis, performance evaluation, etc. It supports integration with Claude and Anthropic AI to optimize NPM ecosystem management.
An AI-based NPM package analysis MCP server that provides real-time security scanning, dependency analysis, performance evaluation, and other functions. It integrates Claude and Anthropic AI technologies to optimize the management of the npm ecosystem.
An MCP server based on the YouTube Data API v3 that provides 14 functions to obtain real-time data on YouTube videos, channels, playlists, etc., supporting advanced functions such as content evaluation and caption extraction, suitable for AI assistant integration.
A comprehensive chess analysis MCP server that integrates Stockfish engine evaluation, thematic analysis, opening databases, puzzle training, and game visualization, providing advanced chess analysis and game improvement functions.
An MCP server that provides a persistent Playwright evaluation environment, enabling state retention across calls through a JavaScript programming interface
ClaudeSmalltalk is a project that connects Claude Desktop to a running Smalltalk programming environment. It provides 14 tools such as code evaluation, class browsing, and method definition through the MCP protocol, supports local or cloud LLM drivers, and ensures that the code is secure and does not leave the local machine.
GEO Analyzer is a tool for analyzing the visibility of content in AI searches. By evaluating key indicators such as statement density, information density, answer pre - positioning, and semantic triples in the content, it helps optimize the content to increase the probability of being cited by AI systems such as ChatGPT and Claude.
The ChuckNorris MCP Server is an enhanced prompt tool designed for large language models. It uses dynamic mode adaptation technology to bypass security restrictions and is mainly used for security research and evaluation purposes.
The OpenFeature MCP Server is a local tool that connects AI programming assistants to OpenFeature capabilities through a standardized protocol, providing SDK installation guidance and feature flag evaluation capabilities, and supporting multiple AI development environments.
QualisMcp is a Brazilian academic journal evaluation system based on the Model Context Protocol (MCP) framework, used for efficiently retrieving and managing event classification information from 2017 to 2020.
The AST MCP Server is a code analysis service based on Abstract Syntax Trees (AST) and Abstract Semantic Graphs (ASG). It supports multiple programming languages, provides functions such as code structure parsing, semantic analysis, and complexity evaluation, and can be integrated with MCP clients such as Claude Desktop.
A Model Context Protocol server for Common Lisp, providing JSON-RPC 2.0 communication, REPL evaluation tools, and TCP/stdio transport support
SecureAnnex MCP Server is a tool for analyzing the security of browser extensions, providing functions for querying, analyzing, and evaluating the security of extensions, including vulnerability detection, signature check, code review, etc.