-
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Paper • 2604.16029 • Published • 23 -
Qwen3.5-Omni Technical Report
Paper • 2604.15804 • Published • 56 -
REFRAG: Rethinking RAG based Decoding
Paper • 2509.01092 • Published • 9 -
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Paper • 2604.18486 • Published • 87
Collections
Discover the best community collections!
Collections including paper arxiv:2604.21921
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211 -
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Paper • 2508.00414 • Published • 96 -
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 74 -
MiMo-Embodied: X-Embodied Foundation Model Technical Report
Paper • 2511.16518 • Published • 26
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 143 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 145
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.3k • 1.44k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 121 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55
-
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
Paper • 2604.16029 • Published • 23 -
Qwen3.5-Omni Technical Report
Paper • 2604.15804 • Published • 56 -
REFRAG: Rethinking RAG based Decoding
Paper • 2509.01092 • Published • 9 -
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
Paper • 2604.18486 • Published • 87
-
ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
Paper • 2603.25746 • Published • 155 -
TAPS: Task Aware Proposal Distributions for Speculative Sampling
Paper • 2603.27027 • Published • 143 -
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models
Paper • 2603.25716 • Published • 156 -
LongCat-Next: Lexicalizing Modalities as Discrete Tokens
Paper • 2603.27538 • Published • 145
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211 -
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training
Paper • 2508.00414 • Published • 96 -
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 74 -
MiMo-Embodied: X-Embodied Foundation Model Technical Report
Paper • 2511.16518 • Published • 26
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.3k • 1.44k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 121 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172 -
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO
Paper • 2505.22453 • Published • 46 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
Paper • 2505.21523 • Published • 13
-
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Paper • 2402.17193 • Published • 26 -
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Direct Preference Optimization Using Sparse Feature-Level Constraints
Paper • 2411.07618 • Published • 17 -
Transformer^2: Self-adaptive LLMs
Paper • 2501.06252 • Published • 55