steering-vectors-from-finetuning
PublicExploration of an alternative approach to extracting steering vectors. Instead of using the classical contrastive method we investigate whether comparing activations between a base model and its fine-tuned deceptive version reveals a more meaningful latent direction.
Creat:2025-02-02T03:25:21
Update:2025-02-18T01:51:07
1
Stars
0
Stars Increase