Youtu-LLM-2B-GGUF
Description
This repository contains GGUF format model files for Tencent's Youtu-LLM-2B.
Youtu-LLM-2B is a highly efficient 1.96B parameter model featuring a Dense MLA architecture and a native 128K context window. Despite its small size, it supports Agentic capabilities and "Reasoning Mode" (Chain of Thought), outperforming many larger models in STEM, coding, and agentic benchmarks.
Evaluation Results
Files & Quantization
To see the available files, please verify the Files and versions tab.
How to Run (llama.cpp)
Note: This model uses the Dense MLA architecture. Please ensure you are using the latest version of llama.cpp to support this architecture correctly.
Recommended Parameters: This model supports two modes. Adjust your temperature accordingly:
- Reasoning Mode (CoT): Temperature
1.0(Recommended for complex logic/math). - Normal Mode: Temperature
0.7(Recommended for chat/stability). - Context:
-c(Supports up to 131072).
CLI Example
./llama-cli -m Youtu-LLM-2B.Q4_K_M.gguf \
-c 8192 \
--temp 1.0 \
--top-p 0.95 \
-p "User: Explain the theory of relativity.\nAssistant:" \
-cnv
Server Example
./llama-server -m Youtu-LLM-2B.Q4_K_M.gguf \
--port 8080 \
--host 0.0.0.0 \
-c 16384 \
-ngl 99
- Downloads last month
- 895
Hardware compatibility
Log In
to view the estimation