π License β’ π» Code β’ π Technical Report β’ π Benchmarks β’ π Getting Started
π― Brief Introduction
Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
Youtu-LLM has the following features:
- Type: Autoregressive Causal Language Models with Dense MLA
- Release versions: Base and Instruct
- Number of Parameters: 1.96B
- Number of Layers: 32
- Number of Attention Heads (MLA): 16 for Q/K/V
- MLA Rank: 1,536 for Q, 512 for K/V
- MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
- Context Length: 131,072
- Vocabulary Size: 128,256
π€ Model Download
| Model Name | Description | Download |
|---|---|---|
| Youtu-LLM-2B-Base | Base model of Youtu-LLM-2B | π€ Model |
| Youtu-LLM-2B | Instruct model of Youtu-LLM-2B | π€ Model |
| Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | π€ Model |
π Performance Comparisons
Instruct Model
General Benchmarks
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | DeepSeek-R1-Distill-Llama-8B | Youtu-LLM-2B |
|---|---|---|---|---|---|---|
| Commonsense Knowledge Reasoning | ||||||
| MMLU-Redux | 53.0% | 74.1% | 75.6% | 83.8% | 78.1% | 75.8% |
| MMLU-Pro | 36.5% | 54.9% | 53.0% | 69.1% | 57.5% | 61.6% |
| Instruction Following & Text Reasoning | ||||||
| IFEval | 29.4% | 70.4% | 60.4% | 83.6% | 34.6% | 81.2% |
| DROP | 41.3% | 72.5% | 72.0% | 82.9% | 73.1% | 86.7% |
| MUSR | 43.8% | 56.6% | 54.1% | 60.5% | 59.7% | 57.4% |
| STEM | ||||||
| MATH-500 | 84.8% | 89.8% | 91.8% | 95.0% | 90.8% | 93.7% |
| AIME 24 | 30.2% | 44.2% | 46.7% | 73.3% | 52.5% | 65.4% |
| AIME 25 | 23.1% | 37.1% | 34.2% | 64.2% | 34.4% | 49.8% |
| GPQA-Diamond | 33.6% | 36.9% | 43.8% | 55.2% | 45.5% | 48.0% |
| BBH | 31.0% | 69.1% | 76.3% | 87.8% | 77.8% | 77.5% |
| Coding | ||||||
| HumanEval | 64.0% | 84.8% | 79.9% | 95.4% | 88.1% | 95.9% |
| HumanEval+ | 59.5% | 76.2% | 74.7% | 87.8% | 82.5% | 89.0% |
| MBPP | 51.5% | 80.5% | 66.7% | 92.3% | 73.9% | 85.0% |
| MBPP+ | 44.2% | 67.7% | 56.7% | 77.6% | 61.0% | 71.7% |
| LiveCodeBench v6 | 19.8% | 30.7% | 30.8% | 48.5% | 36.8% | 43.7% |
Agentic Benchmarks
| Benchmark | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | Youtu-LLM-2B |
|---|---|---|---|---|
| Deep Research | ||||
| GAIA | 11.4% | 11.7% | 25.5% | 33.9% |
| xbench | 11.7% | 13.9% | 18.4% | 19.5% |
| Code | ||||
| SWE-Bench-Verified | 0.6% | 7.2% | 5.7% | 17.7% |
| EnConda-Bench | 10.8% | 3.5% | 16.1% | 21.5% |
| Tool | ||||
| BFCL V3 | 55.5% | 31.5% | 61.7% | 58.0% |
| ΟΒ²-Bench | 2.6% | 9.7% | 10.9% | 15.0% |
π Quick Start
This guide will help you quickly deploy and invoke the Youtu-LLM-2B model. This model supports "Reasoning Mode", enabling it to generate higher-quality responses through Chain of Thought (CoT).
Server Example
Enable Reasoning Mode (default):
./llama-server -m Youtu-LLM-2B-F16.gguf \
--port 8080 \
--host 0.0.0.0
Disable Reasoning Mode:
./llama-server -m Youtu-LLM-2B-F16.gguf \
--port 8080 \
--host 0.0.0.0 \
--reasoning-budget 0
Key Configuration Details
Reasoning Mode Toggle
Controlled via the --reasoning-budget parameter:
- Default (no flag): Enables Chain of Thought; ideal for complex logic and reasoning tasks. Response includes
reasoning_contentfield. --reasoning-budget 0: Disables reasoning; faster response time, suitable for simple conversations.
Recommended Decoding Parameters
| Parameter | Reasoning Mode | Normal Mode |
|---|---|---|
temperature |
1.0 (Maintains creativity) | 0.7 (More stable results) |
top_p |
0.95 | 0.8 |
top_k |
20 | 20 |
repetition_penalty |
1.05 | - |
Tip: When using Reasoning Mode, a higher
temperaturehelps the model perform deeper, more divergent thinking.
π Citation
If you find our work useful in your research, please consider citing the following paper:
@article{youtu-llm,
title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
author={Tencent Youtu Lab},
year={2025},
eprint={2512.24618},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.24618},
}
- Downloads last month
- 386
8-bit
16-bit