cap-rlvr
PublicCAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning using Caselaw Access Project data. Complete GRPO training pipeline with OpenAI Gym environments, deterministic reward functions, and multi-stage curriculum learning for legal LLM development.