Best Inference Cost AI Tools & Models - Premium Inference Cost News

AI News

The Powerhouse of Computing Evolves Again! Microsoft Unveils Maia 200 Inference Chip with Over 100 Billion Transistors Targeting Large-Scale AI

Microsoft launches its new generation AI inference chip, Maia 200, with significantly improved performance, remarkable energy efficiency, and cost advantages. The chip uses advanced manufacturing processes and integrates over 100 billion transistors, offering Petaflops-level computing power.

9.6k 14 hours ago

a16z Leads! Inferact Enters with 150 Million USD, Kicks Off the AI Inference Cost Reduction Battle

Inferact, founded by the vLLM core team, raised $150M in seed funding at an $8B pre-money valuation, led by a16z and Lightspeed, highlighting a shift in AI focus from training to inference commercialization.....

10.7k 19 hours ago

a16z Leads! Inferact Enters with 150 Million USD, Kicks Off the AI Inference Cost Reduction Battle

30B Parameter Open-Source Magic Crushes 1T Giants! MiroThinker 1.5 Redefines the Future of AI, Reducing Inference Costs by 20 Times

The 3B-parameter MiroThinker1.5 model rivals trillion-parameter models in performance with efficient interaction, cutting inference costs to 1/20, advancing AI toward 'intelligence density' and energizing the open-source community.....

15.9k 2 hours ago

30B Parameter Open-Source Magic Crushes 1T Giants! MiroThinker 1.5 Redefines the Future of AI, Reducing Inference Costs by 20 Times

Tsinghua Open Sources TurboDiffusion: AI Video Generation Enters the Second-Level Era, with Speed Up to 200 Times Faster

TSAIL Lab and Shengshu Tech launch TurboDiffusion, an open-source video generation acceleration framework. It integrates SageAttention and sparse linear attention to cut computational costs for high-resolution video processing, boosting end-to-end diffusion inference speed 100-200x while maintaining quality.....

14k 3 days ago

Tsinghua Open Sources TurboDiffusion: AI Video Generation Enters the Second-Level Era, with Speed Up to 200 Times Faster

AI Products

OpenAI o3-mini

OpenAI o3-mini is the latest cost-effective inference model released by OpenAI, optimized specifically for the STEM fields.

AI model

10.7k

Awan LLM

An unlimited token, unrestricted, cost-effective LLM inference API platform.

Development platform

16.6k

Models

Gemini 2.0 Flash-Lite

Google

$0.49

Input tokens/M

$2.1

Output tokens/M

Context Length

GPT-4.1 mini

Openai

$2.8

Input tokens/M

$11.2

Output tokens/M

Context Length

Grok 4 Fast

Xai

$1.4

Input tokens/M

$3.5

Output tokens/M

Context Length

o3-mini

Openai

$7.7

Input tokens/M

$30.8

Output tokens/M

200

Context Length

Claude Haiku 4.5

Anthropic

Input tokens/M

$35

Output tokens/M

200

Context Length

Gemini 2.5 Flash

Google

$2.1

Input tokens/M

$17.5

Output tokens/M

Context Length

Claude Sonnet 4.5

Anthropic

$21

Input tokens/M

$105

Output tokens/M

200

Context Length

Gemini 2.5 Flash-Lite

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

Qianfan-Lightning

Baidu

Input tokens/M

Output tokens/M

128

Context Length

qwen-image-plus

Alibaba

Input tokens/M

Output tokens/M

Context Length

qwen3-max

Alibaba

Input tokens/M

$24

Output tokens/M

256

Context Length

Qwen3-Next-80B-A3B-Instruct

Alibaba

Input tokens/M

Output tokens/M

256

Context Length

Doubao-1.5-pro-32k

Bytedance

$0.8

Input tokens/M

Output tokens/M

128

Context Length

Doubao-Seed-1.6-flash

Bytedance

$0.15

Input tokens/M

$1.5

Output tokens/M

256

Context Length

Doubao-Seedance-1.0-pro

Bytedance

Input tokens/M

Output tokens/M

Context Length

Grok Code Fast 1

Xai

$1.4

Input tokens/M

$10.5

Output tokens/M

256

Context Length

Hunyuan-T1-20250822

Tencent

Input tokens/M

Output tokens/M

Context Length

Hunyuan-T1-latest

Tencent

Input tokens/M

Output tokens/M

Context Length

DeepSeek-V3.1

Deepseek

Input tokens/M

$12

Output tokens/M

128

Context Length

Qwen3-1.7B

Alibaba

Input tokens/M

Output tokens/M

Context Length

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

The Powerhouse of Computing Evolves Again! Microsoft Unveils Maia 200 Inference Chip with Over 100 Billion Transistors Targeting Large-Scale AI

a16z Leads! Inferact Enters with 150 Million USD, Kicks Off the AI Inference Cost Reduction Battle

30B Parameter Open-Source Magic Crushes 1T Giants! MiroThinker 1.5 Redefines the Future of AI, Reducing Inference Costs by 20 Times

Tsinghua Open Sources TurboDiffusion: AI Video Generation Enters the Second-Level Era, with Speed Up to 200 Times Faster

AI Products

OpenAI o3-mini

Awan LLM

Models

Gemini 2.0 Flash-Lite

GPT-4.1 mini

Grok 4 Fast

o3-mini

Claude Haiku 4.5

Gemini 2.5 Flash

Claude Sonnet 4.5

Gemini 2.5 Flash-Lite

Qianfan-Lightning

qwen-image-plus

qwen3-max

Qwen3-Next-80B-A3B-Instruct

Doubao-1.5-pro-32k

Doubao-Seed-1.6-flash

Doubao-Seedance-1.0-pro

Grok Code Fast 1

Hunyuan-T1-20250822

Hunyuan-T1-latest

DeepSeek-V3.1

Qwen3-1.7B

Huihui MiroThinker V1.0 30B Abliterated MXFP4_MOE GGUF

Cerebras.MiniMax M2 REAP 172B A10B GGUF

GLM 4.5 Air REAP 82B A12B MXFP4_MOE GGUF

NVIDIA Nemotron Nano 12B V2 VL NVFP4 QAD

Qwen3 Nemotron 32B RLBFF

Huihui Gpt Oss 20b Mxfp4 Abliterated V2

Alibaba NLP.Tongyi DeepResearch 30B A3B GGUF

MobileLLM R1 950M

PARD Qwen3 0.6B

PARD Llama 3.2 1B

Thinkless 1.5B RL DeepScaleR

Qwen3 30B A6B 16 Extreme

Jetmoe 8b Chat

RetNet 410m XATL