code - a dapumptu Collection

X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests

Paper • 2601.06953 • Published Jan 11 • 45

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Paper • 2410.08196 • Published Oct 10, 2024 • 48

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25, 2024 • 64

Competitive Programming with Large Reasoning Models

Paper • 2502.06807 • Published Feb 3, 2025 • 69

How Programming Concepts and Neurons Are Shared in Code Language Models

Paper • 2506.01074 • Published Jun 1, 2025 • 4

Multi-Programming Language Sandbox for LLMs

Paper • 2410.23074 • Published Oct 30, 2024

codefuse-ai/CodeExercise-Python-27k

Updated Mar 15, 2025 • 1.22k • 67

IIGroup/X-Coder-SFT-376k

Viewer • Updated 19 days ago • 887k • 4.35k • 11

agentica-org/DeepCoder-Preview-Dataset

Viewer • Updated Apr 9, 2025 • 25k • 3.57k • 97

nvidia/Nemotron-Competitive-Programming-v1

Preview • Updated Dec 15, 2025 • 1.22k • 19

inclusionAI/Ling-Coder-SFT

Viewer • Updated Mar 27, 2025 • 4.48M • 1.65k • 37

allenai/Dolci-RL-Zero-Code-7B

Viewer • Updated Jan 5 • 13.3k • 149 • 9

flytech/python-codes-25k

Viewer • Updated May 15, 2024 • 49.6k • 5.71k • 160

ByteDance-Seed/Code-Contests-Plus

Viewer • Updated Nov 6, 2025 • 49.2k • 25.4k • 60

theblackcat102/evol-code-zh

Viewer • Updated Aug 25, 2023 • 10.3k • 120 • 11

microsoft/NextCoderDataset

Viewer • Updated Jul 8, 2025 • 381k • 1.26k • 53

RLVR-SvS/Variational-DAPO

Viewer • Updated Aug 23, 2025 • 314k • 25 • 2

LHL3341/Caco-1.3M

Viewer • Updated Oct 23, 2025 • 1.35M • 85 • 4

Fate-Zero/ArcherCodeR-Dataset

Updated Jun 23, 2025 • 169 • 2

nvidia/OpenCodeGeneticInstruct

Viewer • Updated May 23, 2025 • 15.1M • 904 • 20

nvidia/OpenCodeInstruct

Viewer • Updated Apr 28, 2025 • 4.97M • 3.42k • 60

microsoft/EpiCoder-func-380k

Viewer • Updated Mar 5, 2025 • 380k • 35 • 29

EpiCoder: Encompassing Diversity and Complexity in Code Generation

Paper • 2501.04694 • Published Jan 8, 2025 • 16

IterPref: Focal Preference Learning for Code Generation via Iterative Debugging

Paper • 2503.02783 • Published Mar 4, 2025 • 7

SynthCoder: A Synthetical Strategy to Tune LLMs for Code Completion

Paper • 2508.15495 • Published Aug 21, 2025 • 1

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Paper • 2510.23208 • Published Oct 27, 2025 • 1

AutoML-org/SyntheticCode-800K

Viewer • Updated Nov 12, 2025 • 792k • 10 • 3

banksy235/XCoder-80K

Viewer • Updated Oct 16, 2024 • 80k • 25 • 14

Can Programming Languages Boost Each Other via Instruction Tuning?

Paper • 2308.16824 • Published Aug 31, 2023 • 12

Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming

Paper • 2601.11332 • Published Jan 16 • 1

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Paper • 2506.19290 • Published Jun 24, 2025 • 53

SoTaNa: The Open-Source Software Development Assistant

Paper • 2308.13416 • Published Aug 25, 2023 • 14

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs

Paper • 2504.04030 • Published Apr 5, 2025 • 2