Qwen3-4B-SafeRL is a safety-aligned version based on the Qwen3-4B model. It is trained through reinforcement learning and combined with the reward signals of Qwen3Guard-Gen, enhancing the model's robustness against harmful or adversarial prompts. While ensuring safety, it avoids overly simple or evasive rejection behaviors.
Natural Language Processing
Transformers