Google
$0.7
Input tokens/M
$2.8
Output tokens/M
1k
Context Length
Bytedance
$0.8
$2
128
-
Openai
$8.75
$70
400
$4
$16
$3
$9
32
Stepfun
Huawei
Baidu
$1
64
Sensetime
Alibaba
$1.05
$4.2
$0.1
$0.4
Chatglm
$100
$17.5
2.1k
yujieouo
G²RPO is a novel reinforcement learning framework specifically designed for preference alignment in flow models, significantly improving the generation quality through a granular reward evaluation mechanism.