Best MT-Bench AI Tools & Models - Premium MT-Bench News

AI News

Musk's New AI Favorite! Grok 4.1 Makes a Stunning Debut, Chat Experience Significantly Improved

xAI launches Grok-4.1 with 42% lower latency, 18% higher intent accuracy, and improved dialogue coherence. Based on Grok-4MoE, it adds real-time feedback and personalized caching for instant responses. Available unlimited to X Premium+ users, API costs $5/million tokens. Achieves record scores: MT-Bench 8.97, HumanEval 87.1%, multi-turn consistency 91.4%.....

17.9k yesterday

DeepSeek Updates! DeepSeek V2.5 Achieves Leap in Chat Model Coding Capabilities with Comprehensive Performance Improvements

DeepSeek V2.5 demonstrates exceptional performance in the field of artificial intelligence, particularly in code generation and chat models. Through comparative testing with GPT-4, it has achieved significant improvements across multiple metrics, including win rates, MT-Bench, and AlignBench scores. In terms of code generation capabilities, DeepSeek V2.5 achieved a HumanEval score of 89% and a LiveCodeBench score of 41%, showcasing its ability to generate high-quality, executable code.

28.2k 6 days ago

Arcee Spark: Model Based on Qwen2 Outperforms GPT-3.5 Across Multiple Tasks

Recently, a model named Arcee Spark based on Qwen2 has undergone fine-tuning on 1.8 million sample data, featuring a 128k token context. The release of Arcee Spark has attracted widespread attention, particularly sparking a surge of interest among professionals in the field of artificial intelligence. In benchmark tests such as MT-Bench, it has performed exceptionally well, achieving the highest scores among similar models and even surpassing GPT-3.5 on multiple tasks. It is reported that Arcee

14k yesterday

Arcee Spark: Model Based on Qwen2 Outperforms GPT-3.5 Across Multiple Tasks

Models

Qwen2.5 Bakeneko 32b Instruct V2

rinna

An instruction-tuned variant based on Qwen2.5 Bakeneko 32B, enhanced with Chat Vector and ORPO optimization for improved instruction-following capabilities, excelling in Japanese MT-Bench.

Natural Language Processing

TransformersJapanese

rinna

140

Llama 3 8b Gpt 4o Ru1.0

ruslandev

A language model fine-tuned based on Meta-Llama-3-8B-Instruct. It improves data quality through GPT-4o and focuses on enhancing Russian language capabilities. In the MT-Bench evaluation, its Russian score exceeds that of GPT-3.5-turbo.

Natural Language Processing

Transformers

ruslandev

1.2k

Sonya 7B

SanjiWatsuki

Sonya-7B is a 7B-parameter large language model that excels in the MT-Bench benchmark, surpassing GPT-4 in the first round and ranking second overall.

Natural Language Processing

TransformersEnglish

SanjiWatsuki

4.4k

Qarasu 14B Chat Plus Unleashed

lightblue

Qarasu is a Japanese and English dialogue model fine-tuned based on Qwen-14B-Chat, demonstrating excellent performance in the MT-Bench benchmark.

Natural Language Processing

TransformersMultiple Languages

lightblue

14B DPO Alpha

CausalLM

CausalLM/14B-DPO-α is a large-scale causal language model supporting Chinese and English text generation tasks, with outstanding performance in MT-Bench evaluations.

Natural Language Processing

TransformersMultiple Languages

CausalLM

172

118

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map