SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
Paper • 2604.08865 • Published • 12
Multilingual and multimodal LLM, data synthesis, complex reasoning with LLMs
SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
From Word to World: Can Large Language Models be Implicit Text-based World Models?