Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
Paper • 2605.26108 • Published • 2
The official organization of Tencent Hunyuan team
Reinforcing Few-step Generators via Reward-Tilted Distribution Matching
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models