Can large language models explore in-context?
Paper
• 2403.15371
• Published
• 33
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published
• 46
Long-context LLMs Struggle with Long In-context Learning
Paper
• 2404.02060
• Published
• 37
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
• 2404.03820
• Published
• 25
Social Skill Training with Large Language Models
Paper
• 2404.04204
• Published
• 15
Stream of Search (SoS): Learning to Search in Language
Paper
• 2404.03683
• Published
• 30
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper
• 2404.05961
• Published
• 66
Toward Self-Improvement of LLMs via Imagination, Searching, and
Criticizing
Paper
• 2404.12253
• Published
• 55
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper
• 2408.10914
• Published
• 45
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion
for Efficient Inference Intervention in Large Language Model
Paper
• 2408.10764
• Published
• 9
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published
• 80
Attention Heads of Large Language Models: A Survey
Paper
• 2409.03752
• Published
• 92
Building Math Agents with Multi-Turn Iterative Preference Learning
Paper
• 2409.02392
• Published
• 16
DSBench: How Far Are Data Science Agents to Becoming Data Science
Experts?
Paper
• 2409.07703
• Published
• 66
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published
• 48
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with
100+ NLP Researchers
Paper
• 2409.04109
• Published
• 48
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
Agent-as-a-Judge: Evaluate Agents with Agents
Paper
• 2410.10934
• Published
• 23
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Paper
• 2503.05132
• Published
• 57
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263