Instructions to use Himanshu167/AI-Response-Comparer-v1.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Himanshu167/AI-Response-Comparer-v1.5 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Himanshu167/AI-Response-Comparer-v1.5", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Himanshu167/AI-Response-Comparer-v1.5", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
AI-Response-Comparer-v1.5
AI-Response-Comparer-v1.5 is a fine-tuned version of microsoft/deberta-v3-large for preference classification and reward modeling tasks.
The model compares two AI-generated responses for the same prompt and predicts a probability distribution over three outcomes:
- Response A preferred
- Response B preferred
- Tie
The output is generated using a 3-class softmax classification head, where probabilities sum to 1.
Model Details
Base Model
microsoft/deberta-v3-large
Fine-tuning Strategy
- Full fine-tuning
- Learning rate:
1e-5 - Sequential multi-dataset training
Unlike later versions that used jointly shuffled datasets, this version was trained sequentially:
- Kaggle LLM Classification Finetuning
- Anthropic HH-RLHF
Each dataset was shuffled independently during its respective training phase.
Preprocessing Strategy
To maintain consistent input lengths and manageable training compute requirements:
- Conversations were limited to a maximum of 2 turns
- Inputs were truncated to a maximum sequence length of 512 tokens
These preprocessing rules were applied consistently across training and evaluation data.
Training Datasets
Included Datasets
Intended Use
This model is intended for:
- Reward modeling
- Preference classification
- AI response ranking
- RLHF experimentation
- Human preference approximation
- LLM response comparison
Limitations
- Primarily trained on English conversational data
- Limited to short conversational windows
- Not optimized for long-context reasoning
- Sequential dataset training may bias the model toward later training distributions
- Preference labels may inherit annotator bias
- Not calibrated for production moderation or safety-critical systems
License
Model Weights
This repository includes datasets with non-commercial licensing restrictions.
Therefore:
- Model weights are licensed under:
- CC BY-NC 4.0
Commercial usage of the trained weights is not permitted without ensuring compliance with upstream dataset licenses.
Source Code
- Training scripts and source code are licensed under:
- Apache-2.0
Attribution
Base Model
- Microsoft DeBERTa-v3-large
Datasets
- Anthropic HH-RLHF
- Kaggle LLM Classification Finetuning
Citation
@misc{himanshu2026airesponsecomparerv15,
title={AI-Response-Comparer-v1.5},
author={Himanshu Bansal},
year={2026},
publisher={Hugging Face},
howpublished={https://huggingface.co/Himanshu167/AI-Response-Comparer-v1.5}
}
- Downloads last month
- 18
Model tree for Himanshu167/AI-Response-Comparer-v1.5
Base model
microsoft/deberta-v3-large