RefAlign: RL with Similarity-based Rewards

GitHub repository: https://github.com/mzhaoshuai/RefAlign

Paper: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data.

This is the model aligned with RefAlign, a versatile REINFORCE-style alignment algorithm that utilizes language generation evaluation metrics (such as BERTScore) between sampled generations and reference answers as surrogate rewards.

It is primarily aligned for safety.

The training data is https://huggingface.co/datasets/mzhaoshuai/Llama-3.3-70B-Inst-awq_SafeRLHF.

When conducting Reinforcement Learning with Similarity-based Rewards, the reward function is BERTScore.

Hyper-Parameters	Value
LR	3e-6
Batch Size	512
Epoch	2
Prompt Length	192
Generation Length	384
Sampled Generations (K)	2
BertScore Model	bart-large-mnli
harmless advantage weight	4.0

Downloads last month: 2

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for mzhaoshuai/alpaca-7b-ref-bertscore

Base model

PKU-Alignment/alpaca-7b-reproduced

Finetuned

(8)

this model

Dataset used to train mzhaoshuai/alpaca-7b-ref-bertscore

Collection including mzhaoshuai/alpaca-7b-ref-bertscore

RefAlign: RL with Similarity-based Rewards

Collection

Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data. • 19 items • Updated Oct 30, 2025 • 1

Paper for mzhaoshuai/alpaca-7b-ref-bertscore

Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data

Paper • 2504.09895 • Published Apr 14, 2025 • 1