wo-datacraft 's Collections Toolkit - AI Papers
updated
Neural Machine Translation by Jointly Learning to Align and Translate
Paper
• 1409.0473
• Published
• 7
Attention Is All You Need
Paper
• 1706.03762
• Published
• 115
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
• 1810.04805
• Published
• 26
Hierarchical Reasoning Model
Paper
• 2506.21734
• Published
• 48
Scaling Laws for Neural Language Models
Paper
• 2001.08361
• Published
• 9
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter
Paper
• 1910.01108
• Published
• 21
Language Models are Few-Shot Learners
Paper
• 2005.14165
• Published
• 19
LoRA: Low-Rank Adaptation of Large Language Models
Paper
• 2106.09685
• Published
• 58
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper
• 2005.11401
• Published
• 14
Training language models to follow instructions with human feedback
Paper
• 2203.02155
• Published
• 24
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient Sparsity
Paper
• 2101.03961
• Published
• 13
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Paper
• 2208.07339
• Published
• 5
PaLM: Scaling Language Modeling with Pathways
Paper
• 2204.02311
• Published
• 3
A Survey on Large Language Model based Autonomous Agents
Paper
• 2308.11432
• Published
• 3
Paper
• 2303.08774
• Published
• 7
Large Language Models are Zero-Shot Reasoners
Paper
• 2205.11916
• Published
• 3
Principled Instructions Are All You Need for Questioning LLaMA-1/2,
GPT-3.5/4
Paper
• 2312.16171
• Published
• 37
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper
• 2302.04761
• Published
• 12
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts
Language Model
Paper
• 2405.04434
• Published
• 25
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published
• 440
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published
• 189
Paper
• 2505.09388
• Published
• 336
Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future
Directions
Paper
• 2505.00675
• Published
• 3
Small Language Models are the Future of Agentic AI
Paper
• 2506.02153
• Published
• 24
gpt-oss-120b & gpt-oss-20b Model Card
Paper
• 2508.10925
• Published
• 15
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published
• 509
Poisoning Attacks on LLMs Require a Near-constant Number of Poison
Samples
Paper
• 2510.07192
• Published
• 5
A Survey of Vibe Coding with Large Language Models
Paper
• 2510.12399
• Published
• 50
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
• 2511.18538
• Published
• 299
Denoising Diffusion Probabilistic Models
Paper
• 2006.11239
• Published
• 9
Denoising Diffusion Implicit Models
Paper
• 2010.02502
• Published
• 4
Score-Based Generative Modeling through Stochastic Differential
Equations
Paper
• 2011.13456
• Published
• 2
Learning Transferable Visual Models From Natural Language Supervision
Paper
• 2103.00020
• Published
• 19
Hierarchical Text-Conditional Image Generation with CLIP Latents
Paper
• 2204.06125
• Published
• 3
Classifier-Free Diffusion Guidance
Paper
• 2207.12598
• Published
• 4
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
Transformer
Paper
• 1910.10683
• Published
• 16
LLaMA: Open and Efficient Foundation Language Models
Paper
• 2302.13971
• Published
• 20
Paper
• 2310.06825
• Published
• 58
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 255
An Image is Worth 16x16 Words: Transformers for Image Recognition at
Scale
Paper
• 2010.11929
• Published
• 15
Recursive Language Models
Paper
• 2512.24601
• Published
• 90