This model is decensored using a technique I developed called DeLMAT: Decensoring Language Models through Activation Tuning. It's similar to the ablation / "abliteration" scripts that are out there, but works by training a LoRA adapter and calculating a loss based on the distance from the mean refusal activation and the distance between the mean acceptance activation.

The training script is released under the MIT license: https://github.com/nkpz/DeLMAT

Rather than simply attempting to cancel out the refusal direction, DeLMAT guides the model toward an acceptance. In other words, instead of simply forgetting how to refuse requests, it learns to emphatically accept requests.

Downloads last month
13
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nkpz/DeepHermes-3-Llama-3-8B-Preview-Uncensored-DeLMAT

Quantizations
8 models