OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published
• 127
Granite Code Models: A Family of Open Foundation Models for Code
Intelligence
Paper
• 2405.04324
• Published
• 25
Seed-Coder: Let the Code Model Curate Data for Itself
Paper
• 2506.03524
• Published
• 6
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
Paper
• 2601.06953
• Published
• 45
MathCoder2: Better Math Reasoning from Continued Pretraining on
Model-translated Mathematical Code
Paper
• 2410.08196
• Published
• 48
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
• 2409.17115
• Published
• 64
Competitive Programming with Large Reasoning Models
Paper
• 2502.06807
• Published
• 69
How Programming Concepts and Neurons Are Shared in Code Language Models
Paper
• 2506.01074
• Published
• 4
Multi-Programming Language Sandbox for LLMs
Paper
• 2410.23074
• Published
codefuse-ai/CodeExercise-Python-27k
Updated
• 1.22k
• 67
Viewer
• Updated
• 887k • 4.35k
• 11
agentica-org/DeepCoder-Preview-Dataset
Viewer
• Updated
• 25k • 3.57k
• 97
nvidia/Nemotron-Competitive-Programming-v1
Preview
• Updated
• 1.22k
• 19
inclusionAI/Ling-Coder-SFT
Viewer
• Updated
• 4.48M • 1.65k
• 37
allenai/Dolci-RL-Zero-Code-7B
Viewer
• Updated
• 13.3k • 149
• 9
Viewer
• Updated
• 49.6k • 5.71k
• 160
ByteDance-Seed/Code-Contests-Plus
Viewer
• Updated
• 49.2k • 25.4k
• 60
theblackcat102/evol-code-zh
Viewer
• Updated
• 10.3k • 120
• 11
microsoft/NextCoderDataset
Viewer
• Updated
• 381k • 1.26k
• 53
RLVR-SvS/Variational-DAPO
Viewer
• Updated
• 314k • 25
• 2
Viewer
• Updated
• 1.35M • 85
• 4
Fate-Zero/ArcherCodeR-Dataset
Updated
• 169
• 2
nvidia/OpenCodeGeneticInstruct
Viewer
• Updated
• 15.1M • 904
• 20
Viewer
• Updated
• 4.97M • 3.42k
• 60
microsoft/EpiCoder-func-380k
Viewer
• Updated
• 380k • 35
• 29
EpiCoder: Encompassing Diversity and Complexity in Code Generation
Paper
• 2501.04694
• Published
• 16
IterPref: Focal Preference Learning for Code Generation via Iterative
Debugging
Paper
• 2503.02783
• Published
• 7
SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion
Paper
• 2508.15495
• Published
• 1
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks
Paper
• 2510.23208
• Published
• 1
AutoML-org/SyntheticCode-800K
Viewer
• Updated
• 792k • 10
• 3
Viewer
• Updated
• 80k • 25
• 14
Can Programming Languages Boost Each Other via Instruction Tuning?
Paper
• 2308.16824
• Published
• 12
Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming
Paper
• 2601.11332
• Published
• 1
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in
LLMs
Paper
• 2506.19290
• Published
• 53
SoTaNa: The Open-Source Software Development Assistant
Paper
• 2308.13416
• Published
• 14
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Paper
• 2504.04030
• Published
• 2