alpaca-rlhf
PublicFinetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
Creat:2023-04-12T16:19:46
Update:2025-03-24T17:19:53
https://88aeeb3aef5040507e.gradio.live/
114
Stars
0
Stars Increase
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat