-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
Paper • 2603.09229 • Published • 82 -
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
Paper • 2603.11076 • Published • 5 -
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
Paper • 2603.21065 • Published • 77
Collections
Discover the best community collections!
Collections including paper arxiv:2603.21065
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.4k • 1.44k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 119 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?
Paper • 2510.02209 • Published • 57 -
MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading
Paper • 2509.05080 • Published -
TradingGroup: A Multi-Agent Trading System with Self-Reflection and Data-Synthesis
Paper • 2508.17565 • Published • 1 -
QTMRL: An Agent for Quantitative Trading Decision-Making Based on Multi-Indicator Guided Reinforcement Learning
Paper • 2508.20467 • Published
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 7 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 55
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Flash-KMeans: Fast and Memory-Efficient Exact K-Means
Paper • 2603.09229 • Published • 82 -
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
Paper • 2603.11076 • Published • 5 -
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning
Paper • 2603.21065 • Published • 77
-
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?
Paper • 2510.02209 • Published • 57 -
MM-DREX: Multimodal-Driven Dynamic Routing of LLM Experts for Financial Trading
Paper • 2509.05080 • Published -
TradingGroup: A Multi-Agent Trading System with Self-Reflection and Data-Synthesis
Paper • 2508.17565 • Published • 1 -
QTMRL: An Agent for Quantitative Trading Decision-Making Based on Multi-Indicator Guided Reinforcement Learning
Paper • 2508.20467 • Published
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.4k • 1.44k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 119 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Paper • 2410.10819 • Published • 7 -
LLMtimesMapReduce: Simplified Long-Sequence Processing using Large Language Models
Paper • 2410.09342 • Published • 39 -
PDFTriage: Question Answering over Long, Structured Documents
Paper • 2309.08872 • Published • 55
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75