view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 97
Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents Paper • 2510.23691 • Published Oct 27, 2025 • 53
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2, 2025 • 124
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference Paper • 2508.15881 • Published Aug 21, 2025 • 9
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting Paper • 2410.17856 • Published Oct 23, 2024 • 52