ShiqiangWoo 's Collections 20250903
updated
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 230
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn
Tool-Integrated Reasoning
Paper
• 2509.02479
• Published
• 84
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models
for Document Conversion
Paper
• 2509.01215
• Published
• 51
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper
• 2509.00676
• Published
• 85
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
• 2509.02544
• Published
• 125
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
Paper
• 2509.01055
• Published
• 79
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper
• 2509.02208
• Published
• 43
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR
Paper
• 2509.02522
• Published
• 26
Kwai Keye-VL 1.5 Technical Report
Paper
• 2509.01563
• Published
• 38
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task
Arithmetic
Paper
• 2509.01363
• Published
• 59
Jointly Reinforcing Diversity and Quality in Language Model Generations
Paper
• 2509.02534
• Published
• 25
GenCompositor: Generative Video Compositing with Diffusion Transformer
Paper
• 2509.02460
• Published
• 26
OpenVision 2: A Family of Generative Pretrained Visual Encoders for
Multimodal Learning
Paper
• 2509.01644
• Published
• 34
Attributes as Textual Genes: Leveraging LLMs as Genetic Algorithm
Simulators for Conditional Synthetic Data Generation
Paper
• 2509.02040
• Published
• 15
M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via
Self-Supervision
Paper
• 2509.01360
• Published
• 12
FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in
Diverse Adventure Games
Paper
• 2509.01052
• Published
• 22
Universal Deep Research: Bring Your Own Model and Strategy
Paper
• 2509.00244
• Published
• 14
Discrete Noise Inversion for Next-scale Autoregressive Text-based Image
Editing
Paper
• 2509.01984
• Published
• 7
Fantastic Pretraining Optimizers and Where to Find Them
Paper
• 2509.02046
• Published
• 14
MedDINOv3: How to adapt vision foundation models for medical image
segmentation?
Paper
• 2509.02379
• Published
• 2
Improving Large Vision and Language Models by Learning from a Panel of
Peers
Paper
• 2509.01610
• Published
• 3
Towards More Diverse and Challenging Pre-training for Point Cloud
Learning: Self-Supervised Cross Reconstruction with Decoupled Views
Paper
• 2509.01250
• Published
• 2
SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction
Paper
• 2509.00581
• Published
• 11
C-DiffDet+: Fusing Global Scene Context with Generative Denoising for
High-Fidelity Object Detection
Paper
• 2509.00578
• Published
• 2
Metis: Training Large Language Models with Advanced Low-Bit Quantization
Paper
• 2509.00404
• Published
• 7
FastFit: Accelerating Multi-Reference Virtual Try-On via Cacheable
Diffusion Models
Paper
• 2508.20586
• Published
• 4