DPO

Model Details

This repository contains a DPO (Direct Preference Optimization) model trained as part of the LLM course by VK.

HuggingFaceTB/SmolLM-135M-Instruct was used as the base model, which was further fine-tuned on the HumanLLMs/Human-Like-DPO-Dataset.

Model Description

The Reinforcement Learning from Human Feedback (RLHF) method is traditionally considered complex due to unstable training and its strong dependence on the accuracy of the Reward model. However, the DPO research introduces an alternative, simpler optimization method that significantly streamlines this process.

This repository contains a trained DPO model, specifically fine-tuned to generate more natural, human-like responses. It is based on the HuggingFaceTB/SmolLM-135M-Instruct model, which has been further adapted to fit the user-assistant interaction format.

Examples

Before fine-tuning

user

What's your morning routine like?

assistant

I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

Morning Routine (10-15 minutes)

  1. Hydrate: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
  2. Eat a nutritious breakfast: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
  3. Get some morning sunlight: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
  4. Take a few deep breaths: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
  5. Stretch or move: Engage in some light stretching or movement to get your blood flowing and your muscles moving.

After fine-tuning

user

What's your morning routine like?

assistant

The morning routine is a set of habits that help individuals establish a strong foundation for the day's activities. It's a crucial aspect of personal and professional life, as it sets the tone for our behavior, decision-making, and overall well-being. Here's a detailed explanation of what a morning routine is and its importance:

What is a morning routine?

A morning routine, also known as a wake-up or pre-sleep routine, is a set of steps that help people establish a daily schedule for waking up, eating, sleeping, and engaging in activities. These routines are designed to help individuals get into a routine and improve their quality of life.

A typical morning routine includes:

  1. Waking up: Getting out of bed and waking up in the morning.
  2. Starting a day: Starting the day with the right attitude, mindset, and habits.
  3. Eating a nutritious meal: Eating a meal that provides energy and nutrients.
  4. Staying hydrated: Staying hydrated to prevent dehydration.
  5. Paying for daily expenses: Paying for daily expenses, including rent, utilities, and bills.
  6. Managing time: Managing time to accomplish tasks and meet deadlines

Analysis

image.png

References

Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for thsluck/llm-course-hw2-dpo