glorgao
/

SelectiveDPO-Gemma2-9B-SFT-UFBinarized

Text Generation

text-generation-inference

Model card Files Files and versions

This model is fine-tuned from the tanliboy/zephyr-gemma-2-9b model using the SelectiveDPO on the Ultrafeedback_binarized dataset.

For the recipe to reproduce this model, please visit our GitHub page.

Downloads last month: 2

Safetensors

Model size

9B params

Tensor type

F32

·

Model tree for glorgao/SelectiveDPO-Gemma2-9B-SFT-UFBinarized

Base model

google/gemma-2-9b

Finetuned

tanliboy/zephyr-gemma-2-9b-sft

Finetuned

(2)

this model

Dataset used to train glorgao/SelectiveDPO-Gemma2-9B-SFT-UFBinarized

Collection including glorgao/SelectiveDPO-Gemma2-9B-SFT-UFBinarized

SelectiveDPO

Released models trained by Selective DPO. • 6 items • Updated May 15, 2025

Paper for glorgao/SelectiveDPO-Gemma2-9B-SFT-UFBinarized

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

Paper • 2502.09650 • Published Feb 11, 2025