reinforcement learning
updated
In deep reinforcement learning, a pruned network is a good network
Paper
•
2402.12479
•
Published
•
19
Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL
Paper
•
2403.03950
•
Published
•
15
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
•
2405.07863
•
Published
•
71
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
•
2405.11143
•
Published
•
40
Understanding and Diagnosing Deep Reinforcement Learning
Paper
•
2406.16979
•
Published
•
10
Efficient World Models with Context-Aware Tokenization
Paper
•
2406.19320
•
Published
•
8
It Takes Two: Your GRPO Is Secretly DPO
Paper
•
2510.00977
•
Published
•
31
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise
Reasoning
Paper
•
2510.25992
•
Published
•
45