信息

AI新闻资讯

探索AI前沿，掌握行业发展趋势

AI 商用·开源产品库

精准筛选产品，多维度产品调研

AI 产品排行榜

热门AI产品实力、热度、年/月/日排行

AI产品提交

提交AI产品信息，助力产品推广和用户转化

工具

AI工具导航

一站式AI工具指南，快速找到你需要的工具

信息

模型库

涵盖各类AI模型，满足你的开发与研究需求

大模型排行榜

热门AI大模型性能、热度、年/月/日排行

模型供应商

寻找优质模型提供商，获取可靠模型支持

工具

大模型选型对比

多维度对比大模型，找到最适合你的模型

大模型费用计算器

精准计算大模型使用成本，合理规划预算

大模型竞技场

多模型实时评测，模型输出结果快速比对

信息

MCP服务端

聚集热门MCP服务，快速找到适合你的服务

MCP客户端

轻松接入MCP客户端，调用强大的AI能力

MCP教程与实践

学习MCP使用技巧，从入门到精通

MCP排行榜

热门MCP服务性能排行，帮你找到最佳选择

MCP服务提交

发布你的MCP服务，推广你的MCP服务

工具

MCP实验场

自由测试MCP服务，线上快速体验

MCP服务调试器

快速测试MCP服务，快速上线

工具

GEO全景分析平台

一站式GEO品牌洞察提升AI搜索转化率

GEO品牌监控分析

分析并追踪人工智能模型如何引用您的品牌

GEO排名查询工具

检测品牌在AI平台中的可见度

GEO推广链接检测

快速评估推广文章在AI 平台的引用情况

服务

GEO 大模型推荐优化

通过AI搜索优化服务，让品牌在AI中实现霸屏

工具

模型个人电脑配置检测器

一键检测电脑配置，研判运行模型的兼容性

模型部署服务器配置计算器

根据算力需求，推荐匹配的服务器配置

AI应用指南

Delayed Rewards 相关的热门 GitHub AI项目仓库

发现与 Delayed Rewards 相关的最受欢迎的开源项目和工具，了解最新的开发趋势和创新。

Awesome Exploration Rl

持续更新的优秀探索强化学习资源精选列表

Depined Network Bot

automation

Professional VPN node automation for Depined Network - the decentralized privacy platform that rewards users for sharing bandwidth as secure exit nodes. Earn crypto while enhancing global internet privacy.

180

6个月前

-2today

ImplicitPRM

prm

Repo of paper "Free Process Rewards without Process Labels"

168

1个月前

PyRates

code-generation

一个开源的、基于图的Python代码生成和分析工具箱，用于动态系统（预实现和自定义模型）。大多数预实现模型属于神经群体模型家族。

1个月前

Learning From Rewards Llm Papers

guided-decoding

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

1个月前

+1today

Awesome Agent RL

agent

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

2个月前

Reinforcement Learning For Autonomous Navigation Using Deep Q Network And Twin Delayed DDPG

autonomous-driving

这个项目实现了基于强化学习技术的自动驾驶导航，重点使用深度Q网络（DQN）和双延迟深度确定性策略梯度（TD3）算法。我们专注于训练TurtleBot3机器人，使其能够在环境中自主导航并智能地避开移动障碍物。

2个月前

Awesome Agent Reward

agent

4个月前

Nabla R2D3

3d-generative

[Official] Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards

2个月前

R1

deepseek-r1

?enhanced GRPO with more verifiable rewards and real-time evaluators

1个月前

ICM PPO Implementation

deep-reinforcement-learning

在Unity ML的Pyramid环境中，使用带有内在好奇心模块（ICM）的近端策略优化（PPO）算法

2个月前

Deep RL Project Maximize Total Profits Earned By Cab Driver

actions

本项目的目标是构建一种基于强化学习的算法，帮助出租车司机优化其决策过程，从而最大化收益。以长期利润最大化为目标，我们提出了一种基于强化学习的方法来优化出租车驾驶策略。此优化问题被表述为马尔可夫决策过程 (MDP)。

5个月前

MSc_Curiosity_MARL

curiosity

爱丁堡大学信息学硕士论文项目：多智能体强化学习中的好奇心

2个月前

Epiphany Cli AwaitFunding

anthropic

Our MAGNUM OPUS Delayed For Funding : Epiphany CLI emerges with purpose-built to overcome the fundamental limitations of stochastic language models, specifically their non-deterministic outputs, inadequate context management, and fragile error handling

1个月前

BPQL

bpql

(NeurIPS 2023) Source-code of the paper: Belief Projection-Based Reinforcement Learning for Environments with Delayed Feedback

8个月前

Minimal GRPO

evolution-strategies

Implementation of Group Relative Policy Optimization (GRPO) and Evolutionary Strategy (ES) to fine-tune Open Language Models (like LlaMa-3.2, Qwen2.5) for Tasks with verifiable rewards.

2个月前

Rl For Llms

fine-tuning

Context & Guide For Reinforcement Learning with Verifiable Rewards with Large Language Models

1个月前

Inquire

active-learning

INQUIRE：面向用户的交互式信息推理查询

1年前

OnlineRLHF

pbrl

A repo for Implemented online preference-based reward learning under human irrationality & delayed feedback

2个月前

Ros Gazebo Deep Rl

deep-reinforcement-learning

使用强化学习进行集群管理

2个月前

AI新闻资讯

最新AI日报

AI 商用·开源产品库

AI 产品排行榜

AI产品提交

AI工具导航

模型库

大模型排行榜

模型供应商

大模型选型对比

大模型费用计算器

大模型竞技场

MCP服务端

MCP客户端

MCP教程与实践

MCP排行榜

MCP服务提交

MCP实验场

MCP服务调试器

GEO全景分析平台

GEO品牌监控分析

GEO排名查询工具

GEO推广链接检测

GEO 大模型推荐优化

模型个人电脑配置检测器

模型部署服务器配置计算器

Delayed Rewards 相关的热门 GitHub AI项目仓库

Awesome Exploration Rl

Depined Network Bot

ImplicitPRM

PyRates

Learning From Rewards Llm Papers

Awesome Agent RL

Reinforcement Learning For Autonomous Navigation Using Deep Q Network And Twin Delayed DDPG

Awesome Agent Reward

Nabla R2D3

R1

ICM PPO Implementation

Deep RL Project Maximize Total Profits Earned By Cab Driver

MSc_Curiosity_MARL

Epiphany Cli AwaitFunding

BPQL

Minimal GRPO

Rl For Llms

Inquire

OnlineRLHF

Ros Gazebo Deep Rl