We are recruiting high-level interns (Qingyun/青云) for exploring scaling laws in agentic mid-training. If you are interested, feel free to contact me.
Junrulu
AI & ML interests
None yet
Recent Activity
new activity
about 1 hour ago
tencent/Youtu-LLM-2B:Update README.md for transformers' version annotation
new activity
about 1 hour ago
tencent/Youtu-LLM-2B:@check_model_inputs # <--- 这是导致问题的行
replied to
their
post
about 22 hours ago
We are pleased to introduce a brand-new lightweight LLM, Youtu-LLM:
(1) Youtu-LLM has a total of 2B parameters, employing a 32-layer dense MLA architecture and equipped with an innovative STEM- and agentic-oriented vocabulary;
(2) Based on approximately 11T tokens of pre-training, particularly with native 128k long context extension and native agentic mid-training, Youtu-LLM-2B is comparable to Qwen3-4B in general and agent capabilities;
(3) We have open-sourced the Base/Instruct versions, as well as the evaluation code for reproducing the test metrics. In the technical report, we share our experience of native agentic pre-training in detail.
Youtu-LLM-2B is very suitable as a starting point for exploring on-device agent practice. Meanwhile, we are currently extending this paradigm to larger-scale explorations. We welcome more discussion and collaboration!
🔗 Check the project here: https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm
🤗 Check the models here: https://huggingface.co/collections/tencent/youtu