-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 201 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
Collections
Discover the best community collections!
Collections including paper arxiv:2505.16410
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 106
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 77 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277
-
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Paper • 2505.16410 • Published • 58 -
dongguanting/Tool-Star-SFT-54K
Viewer • Updated • 54k • 126 • 10 -
dongguanting/Multi-Tool-RL-10K
Viewer • Updated • 10k • 83 • 5 -
dongguanting/Tool-Star-Qwen-7B
Text Generation • 8B • Updated • 7 • 2
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 65 -
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework
Paper • 2410.06328 • Published • 2 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 62
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 201 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 263 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 133 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 77 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 277
-
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning
Paper • 2505.16410 • Published • 58 -
dongguanting/Tool-Star-SFT-54K
Viewer • Updated • 54k • 126 • 10 -
dongguanting/Multi-Tool-RL-10K
Viewer • Updated • 10k • 83 • 5 -
dongguanting/Tool-Star-Qwen-7B
Text Generation • 8B • Updated • 7 • 2
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 109 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 106
-
Large Language Models Can Self-Improve in Long-context Reasoning
Paper • 2411.08147 • Published • 65 -
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 24 -
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework
Paper • 2410.06328 • Published • 2 -
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability
Paper • 2411.19943 • Published • 62