Zhihu's 2025 AI Product Rankings, based on user feedback and expert evaluation, offer authoritative market insights. ByteDance's Doubao topped the list as 'Zhihu Users' Favorite of the Year,' highlighting its market leadership.....
On December 22, Zhipu Huazhang released and open-sourced the new generation large model GLM-4.7. The model has shown outstanding performance in multiple international benchmark tests, especially in the coding field, with comprehensive performance surpassing GPT-5.2. It ranked first in both open-source and domestic models on the authoritative coding evaluation platform Code Arena, focusing mainly on programming scenarios.
Robot company Pickle Robot welcomes Evanson, a former Tesla executive, as its first CFO, at a critical time in its collaboration with UPS. Evanson joined full-time after providing advisory services since last September. Previously, he was responsible for investor relations and strategy at Tesla.
AI models have made significant progress in evaluating scientific reasoning abilities, performing excellently in international mathematics and informatics olympiads. With the development of advanced models such as GPT-5, AI is effectively accelerating real scientific research processes, demonstrating strong capabilities in hypothesis generation, testing, refinement, and cross-domain integration.
An open-source platform that provides tools for prompt management, evaluation, and observability of LLM applications.
Vancit simplifies the developer recruitment process through active talent discovery and code evaluation, enabling rapid recruitment.
A data-driven assignment evaluation system serving educators and students.
Test your vibe coding skills and evaluate your AI usage ability, which is used for recruiting AI talents.
01-ai
-
Input tokens/M
Output tokens/M
200
Context Length
Baichuan
4
jayhuang92
Qwen-Image is a text-to-image generation model developed based on the Qwen series. It supports both Chinese and English inputs and performs excellently on multiple evaluation metrics. It is particularly suitable for image generation scenarios that pursue realistic effects.
Shawon16
This is a video understanding model based on the VideoMAE architecture. It was pre-trained on the Kinetics dataset and fine-tuned on an unknown dataset that may be related to sign language recognition. The model achieved an accuracy of 78.11% on the evaluation set and is suitable for video classification tasks.
drbaph
Z-Image is an efficient basic image generation model with 6 billion parameters, specifically designed to address the efficiency and quality issues in the field of image generation. Its distilled version, Z-Image-Turbo, can reach or exceed the leading competitors with only 8 function evaluations. It can achieve sub-second inference latency on enterprise-level H800 GPUs and run on consumer-grade devices with 16G VRAM.
This is a video understanding model based on the VideoMAE architecture. It has been fine-tuned on the basis of pre-training on the Kinetics dataset and is specifically designed for sign language recognition tasks. The model's performance on the evaluation set needs improvement, with an accuracy of 0.0010.
This is a video understanding model based on the VideoMAE-base architecture, which has been fine-tuned for 20 epochs on an unknown dataset. The model's performance on the evaluation set is limited, with an accuracy of 0.0041 and a loss value of 7.7839.
KonradBRG
This model is a joke rating model fine-tuned on multilingual text based on FacebookAI/xlm-roberta-large. It is specifically designed to evaluate the quality and humor level of jokes. It achieved an accuracy of 0.4005 and a root mean square error of 5.0327 on the evaluation set.
This is a video understanding model fine-tuned on an unknown dataset based on the MCG-NJU/videomae-base model. After 20 epochs of training, it achieved an accuracy of 13.31% on the evaluation set. This model is specifically optimized for video analysis tasks.
advy
This model is a large language model fine-tuned on a specific dataset based on meta-llama/Llama-3.1-70B-Instruct. It is specifically designed for text generation tasks and achieved a loss value of 0.6542 on the evaluation set.
Foshie
This is an English-Spanish translation model fine-tuned on the Amazon dataset based on the Google mT5-small model, specifically designed for text abstract generation tasks. The model achieved scores of Rouge1: 16.44 and Rouge2: 8.04 on the evaluation set.
Maxlegrec
The BT4 model is the neural network model behind the LeelaChessZero engine, specifically designed for chess games. This model is based on the Transformer architecture and can predict the best next move based on historical moves, evaluate the game situation, and generate move probabilities.
This is a video action recognition model fine-tuned on the WLASL dataset based on the VideoMAE-base architecture, specifically optimized for sign language recognition tasks, achieving an accuracy of 48.22% on the evaluation set.
This is a video action recognition model fine-tuned on the WLASL dataset based on the VideoMAE-Base architecture. After 200 epochs of training, it achieved a top-1 accuracy of 52.96% and a top-5 accuracy of 79.88% on the evaluation set, specifically designed for sign language action recognition tasks.
yueqis
This model is a professional code generation model fine-tuned on the swe_only_sweagent dataset based on Qwen2.5-Coder-32B-Instruct. It achieved a loss value of 0.1210 on the evaluation set and is specifically optimized for software engineering-related tasks.
EpistemeAI
metatune-gpt20b is a large language model prototype with self-improvement ability. It can generate new data for itself, evaluate its own performance, and adjust hyperparameters according to improvement indicators. This model performs excellently in scientific and mathematical understanding at the postdoctoral level and can also be used for coding tasks.
RedHatAI
This is a quantized version of unsloth/Mistral-Small-3.2-24B-Instruct-2506. By quantizing the weights and activation functions to the FP4 data type, it reduces the disk size and GPU memory requirements while supporting vLLM inference. It has been evaluated on multiple tasks to compare the quality with the non-quantized model.
qthuan2604
This is a Vietnamese text correction model fine-tuned based on vinai/bartpho-syllable, which has achieved a good result of 69.12% character accuracy on the evaluation set and is specifically used for the automatic correction task of Vietnamese texts.
ivan-kleshnin
This is a classifier model fine-tuned based on the jhu-clsp/mmBERT-small model, achieving an accuracy of 91.07% on the evaluation set. It is mainly used for text classification tasks.
yujieouo
G²RPO is a novel reinforcement learning framework specifically designed for preference alignment in flow models, significantly improving the generation quality through a granular reward evaluation mechanism.
Qwen
Qwen3-4B-SafeRL is a safety-aligned version based on the Qwen3-4B model. It is trained through reinforcement learning and combined with the reward signals of Qwen3Guard-Gen, enhancing the model's robustness against harmful or adversarial prompts. While ensuring safety, it avoids overly simple or evasive rejection behaviors.
This is a text classification model fine-tuned based on the mmBERT-small architecture, specifically designed for the message type classification task. It achieved an accuracy of 93.94% on the evaluation set and has efficient text classification capabilities.
Opik is an open-source LLM evaluation framework that supports tracking, evaluating, and monitoring LLM applications, helping developers build more efficient and cost-effective LLM systems.
The Node.js Debugger MCP server provides complete debugging capabilities based on the Chrome DevTools protocol, including breakpoint setting, stepping execution, variable inspection, and expression evaluation.
MCPBench is a framework for evaluating the performance of MCP servers. It supports the evaluation of two types of tasks: web search and database query, is compatible with local and remote servers, and mainly evaluates accuracy, latency, and token consumption.
mcp-chat is an open-source and general-purpose MCP client tool for testing and evaluating MCP servers and proxies. It supports command-line interaction and web mode, can connect to various MCP servers (JS/Python/Docker), and provides functions such as chat history recording, model selection, and system prompt customization to help developers debug MCP services.
This project is a Model Context Protocol (MCP) adapter used to connect large language models (LLMs) with the Lisp development environment, supporting interaction through the lightweight Lisply protocol. The main functions include Lisp code evaluation, HTTP requests, and debugging support, suitable for scenarios such as AI-assisted symbolic programming and CAD design automation.
A comprehensive chess analysis MCP server that integrates Stockfish engine evaluation, thematic analysis, opening databases, puzzle training, and game visualization, providing advanced chess analysis and game improvement functions.
NPM Sentinel MCP is an AI-based NPM package analysis server that provides real-time security scanning, dependency analysis, performance evaluation, etc. It supports integration with Claude and Anthropic AI to optimize NPM ecosystem management.
An MCP server based on the YouTube Data API v3 that provides 14 functions to obtain real-time data on YouTube videos, channels, playlists, etc., supporting advanced functions such as content evaluation and caption extraction, suitable for AI assistant integration.
An AI mentor server based on the Model Context Protocol, providing second - opinion services such as code review, design evaluation, writing feedback, and creative brainstorming through Deepseek - Reasoning
An AI-based NPM package analysis MCP server that provides real-time security scanning, dependency analysis, performance evaluation, and other functions. It integrates Claude and Anthropic AI technologies to optimize the management of the npm ecosystem.
The Linear Regression MCP project demonstrates an end-to-end machine learning workflow using Claude and the Model Context Protocol (MCP), including data preprocessing, model training, and evaluation.
An MCP server that provides a persistent Playwright evaluation environment, enabling state retention across calls through a JavaScript programming interface
The ChuckNorris MCP Server is an enhanced prompt tool designed for large language models. It uses dynamic mode adaptation technology to bypass security restrictions and is mainly used for security research and evaluation purposes.
SecureAnnex MCP Server is a tool for analyzing the security of browser extensions, providing functions for querying, analyzing, and evaluating the security of extensions, including vulnerability detection, signature check, code review, etc.
A Model Context Protocol server for Common Lisp, providing JSON-RPC 2.0 communication, REPL evaluation tools, and TCP/stdio transport support
QualisMcp is a Brazilian academic journal evaluation system based on the Model Context Protocol (MCP) framework, used for efficiently retrieving and managing event classification information from 2017 to 2020.
This project is an MCP server for finding Agoda hotel reviews, which helps users integrate positive and negative evaluations to assist in decision-making.
The AST MCP Server is a code analysis service based on Abstract Syntax Trees (AST) and Abstract Semantic Graphs (ASG). It supports multiple programming languages, provides functions such as code structure parsing, semantic analysis, and complexity evaluation, and can be integrated with MCP clients such as Claude Desktop.
The Root Signals MCP Server is a bridging project that exposes the Root Signals evaluation tools to AI assistants and agents through the Model Context Protocol (MCP), supporting standard evaluation and RAG evaluation with context.
This is an MCP server that provides a standardized interface for Scikit-learn models, supporting functions such as model training, evaluation, data preprocessing, and persistence.