2 23

Cheng Qian

chengq9

https://qiancheng0.github.io

qiancheng0

AI & ML interests

Agent, Tool Learning

Recent Activity

upvoted a paper 15 days ago

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

upvoted a paper 3 months ago

Multimodal Policy Internalization for Conversational Agents

upvoted a paper 3 months ago

Self-Improving LLM Agents at Test-Time

View all activity

Organizations

upvoted a paper 15 days ago

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Paper • 2512.16649 • Published 16 days ago • 23

upvoted 4 papers 3 months ago

upvoted a paper 4 months ago

Context Engineering for Trustworthiness: Rescorla Wagner Steering Under Mixed and Inappropriate Contexts

Paper • 2509.04500 • Published Sep 2, 2025 • 4

upvoted 3 papers 5 months ago

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Paper • 2502.16143 • Published Feb 22, 2025 • 6

UserBench: An Interactive Gym Environment for User-Centric Agents

Paper • 2507.22034 • Published Jul 29, 2025 • 29

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence

Paper • 2507.21046 • Published Jul 28, 2025 • 82

upvoted a paper 6 months ago

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Paper • 2507.07957 • Published Jul 10, 2025 • 79

upvoted 3 papers 7 months ago

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Paper • 2505.24846 • Published May 30, 2025 • 15

ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind

Paper • 2505.22961 • Published May 29, 2025 • 8

Time-R1: Towards Comprehensive Temporal Reasoning in LLMs

Paper • 2505.13508 • Published May 16, 2025 • 15

upvoted a collection 8 months ago

RM-R1

Collection

RM-R1: Reward Modeling as Reasoning • 16 items • Updated Jun 29, 2025 • 9

upvoted a paper 8 months ago

RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 79

upvoted a collection 8 months ago

Qwen3

Collection

84 items • Updated 3 days ago • 1.53k

upvoted a paper 9 months ago

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

Paper • 2502.08820 • Published Feb 12, 2025 • 5

upvoted a collection 9 months ago

ToolRL

Collection

The ToolRL model trained for tool use through GRPO • 3 items • Updated Apr 22, 2025 • 2

upvoted 2 papers 9 months ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published Apr 21, 2025 • 35

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published Apr 16, 2025 • 48

Cheng Qian

AI & ML interests

Recent Activity

Organizations

chengq9's activity