Starling-7B
Enhancing the usability and safety of LLM
CommonProductchattingLanguage ModelReinforcement Learning
Starling-7B is an open-weights large language model (LLM) trained using Reinforcement Learning from AI Feedback (RLAIF). It was trained effectively leveraging our new GPT-4 labeled ranking dataset, Nectar, and a novel reward training and policy optimization process. Starling-7B achieved a score of 8.09 on MT Bench, with GPT-4 as the judge, surpassing all existing models except OpenAI's GPT-4 and GPT-4 Turbo. We have released the ranked dataset Nectar, the reward model Starling-RM-7B-alpha, the language model Starling-LM-7B-alpha on HuggingFace, and an online demo on LMSYS Chatbot Arena. Stay tuned for the upcoming release of our code and paper, which will provide more details about the entire process.
Starling-7B Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
Starling-7B Visit Trend
No Visits Data
Starling-7B Visit Geography
No Geography Data
Starling-7B Traffic Sources
No Traffic Sources Data