AI-Response-Comparer-v1.5

AI-Response-Comparer-v1.5 is a fine-tuned version of microsoft/deberta-v3-large for preference classification and reward modeling tasks.

The model compares two AI-generated responses for the same prompt and predicts a probability distribution over three outcomes:

  • Response A preferred
  • Response B preferred
  • Tie

The output is generated using a 3-class softmax classification head, where probabilities sum to 1.


Model Details

Base Model

  • microsoft/deberta-v3-large

Fine-tuning Strategy

  • Full fine-tuning
  • Learning rate: 1e-5
  • Sequential multi-dataset training

Unlike later versions that used jointly shuffled datasets, this version was trained sequentially:

  1. Kaggle LLM Classification Finetuning
  2. Anthropic HH-RLHF

Each dataset was shuffled independently during its respective training phase.


Preprocessing Strategy

To maintain consistent input lengths and manageable training compute requirements:

  • Conversations were limited to a maximum of 2 turns
  • Inputs were truncated to a maximum sequence length of 512 tokens

These preprocessing rules were applied consistently across training and evaluation data.


Training Datasets

Included Datasets


Intended Use

This model is intended for:

  • Reward modeling
  • Preference classification
  • AI response ranking
  • RLHF experimentation
  • Human preference approximation
  • LLM response comparison

Limitations

  • Primarily trained on English conversational data
  • Limited to short conversational windows
  • Not optimized for long-context reasoning
  • Sequential dataset training may bias the model toward later training distributions
  • Preference labels may inherit annotator bias
  • Not calibrated for production moderation or safety-critical systems

License

Model Weights

This repository includes datasets with non-commercial licensing restrictions.

Therefore:

  • Model weights are licensed under:
    • CC BY-NC 4.0

Commercial usage of the trained weights is not permitted without ensuring compliance with upstream dataset licenses.

Source Code

  • Training scripts and source code are licensed under:
    • Apache-2.0

Attribution

Base Model

  • Microsoft DeBERTa-v3-large

Datasets

  • Anthropic HH-RLHF
  • Kaggle LLM Classification Finetuning

Citation

@misc{himanshu2026airesponsecomparerv15,
  title={AI-Response-Comparer-v1.5},
  author={Himanshu Bansal},
  year={2026},
  publisher={Hugging Face},
  howpublished={https://huggingface.co/Himanshu167/AI-Response-Comparer-v1.5}
}
Downloads last month
18
Safetensors
Model size
0.4B params
Tensor type
F32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Himanshu167/AI-Response-Comparer-v1.5

Finetuned
(269)
this model

Dataset used to train Himanshu167/AI-Response-Comparer-v1.5