The latest research from Google's team suggests that using large models to replace humans for preference labeling can achieve the same results as RLHF. Researchers found that when comparing RLAIF and RLHF in terms of win rates, both were equally popular, each at 50%. This study demonstrates that RLAIF can produce improvements comparable to RLHF without relying on human labelers.