Antidote
PublicThis is the unofficial re-implementation of "Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack" (ICML2025)
Creat:2024-04-11T08:31:45
Update:2025-07-27T15:57:11
https://openreview.net/pdf?id=Arepl4R86m
2
Stars
0
Stars Increase