Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 6 days ago • 87
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 60
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Paper • 2512.17220 • Published 16 days ago • 101
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1, 2025 • 109
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published Dec 25, 2024 • 106
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published 16 days ago • 200
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published 11 days ago • 28
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 17 days ago • 90
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 12 days ago • 60
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 11 days ago • 48
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 16 days ago • 48
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows Paper • 2512.16969 • Published 17 days ago • 108
MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 8
Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model Paper • 2512.13507 • Published 19 days ago • 38
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 16 days ago • 82
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 25 days ago • 78