Knowledge Distillation
updated
shayekh/aya8b-distillkit-hidden
shayekh/aya8b-distillkit-logits
Updated
0.6B • Updated • 3
• 1
Less is More: Task-aware Layer-wise Distillation for Language Model
Compression
Paper
• 2210.01351
• Published • 3
A Survey on Knowledge Distillation of Large Language Models
Paper
• 2402.13116
• Published • 4
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo
Labelling
Paper
• 2311.00430
• Published • 56
On-Policy Distillation of Language Models: Learning from Self-Generated
Mistakes
Paper
• 2306.13649
• Published • 31
Compact Language Models via Pruning and Knowledge Distillation
Paper
• 2407.14679
• Published • 39
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published • 58
DistiLLM: Towards Streamlined Distillation for Large Language Models
Paper
• 2402.03898
• Published • 3
Relational Knowledge Distillation
Paper
• 1904.05068
• Published • 1
Distilling Step-by-Step! Outperforming Larger Language Models with Less
Training Data and Smaller Model Sizes
Paper
• 2305.02301
• Published • 5