AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

苹果新研究揭示LLM对齐新范式:清单式强化学习优于传统奖励模型

苹果研究人员提出新型“清单式”强化学习方案(RLCF),通过让模型对照清单自检工作,显著提升开源大语言模型性能。该方法在复杂指令任务中表现优于传统奖励模型,突破RLHF局限性,成为重要后训练优化手段。

10.2k 3 days ago
苹果新研究揭示LLM对齐新范式:清单式强化学习优于传统奖励模型

Models

View More

Gemma 3 1B

Google

Gemma 3 1B

-

Input tokens/M

-

Output tokens/M

-

Context Length

ERNIE X1.1 Preview

Baidu

ERNIE X1.1 Preview

$1

Input tokens/M

$4

Output tokens/M

64

Context Length

AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2026AIBase
Business CooperationSite Map