-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 108 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 248
Collections
Discover the best community collections!
Collections including paper arxiv:2309.05463
-
Textbooks Are All You Need
Paper β’ 2306.11644 β’ Published β’ 152 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper β’ 2305.07759 β’ Published β’ 38 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper β’ 2406.20094 β’ Published β’ 104
-
microsoft/phi-1
Text Generation β’ 1B β’ Updated β’ 2.98k β’ 218 -
microsoft/phi-1_5
Text Generation β’ 1B β’ Updated β’ 41.9k β’ 1.35k -
Textbooks Are All You Need
Paper β’ 2306.11644 β’ Published β’ 152 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 108 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper β’ 2305.13245 β’ Published β’ 6 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 248
-
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper β’ 2309.04827 β’ Published β’ 17 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper β’ 2403.09629 β’ Published β’ 78
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper β’ 2402.13064 β’ Published β’ 50 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper β’ 2402.10379 β’ Published β’ 31 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper β’ 2312.06585 β’ Published β’ 29
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 108 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
LLaMA: Open and Efficient Foundation Language Models
Paper β’ 2302.13971 β’ Published β’ 20 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 248
-
Textbooks Are All You Need
Paper β’ 2306.11644 β’ Published β’ 152 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Paper β’ 2305.07759 β’ Published β’ 38 -
Scaling Synthetic Data Creation with 1,000,000,000 Personas
Paper β’ 2406.20094 β’ Published β’ 104
-
Attention Is All You Need
Paper β’ 1706.03762 β’ Published β’ 108 -
Language Models are Few-Shot Learners
Paper β’ 2005.14165 β’ Published β’ 18 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper β’ 2305.13245 β’ Published β’ 6 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper β’ 2307.09288 β’ Published β’ 248
-
microsoft/phi-1
Text Generation β’ 1B β’ Updated β’ 2.98k β’ 218 -
microsoft/phi-1_5
Text Generation β’ 1B β’ Updated β’ 41.9k β’ 1.35k -
Textbooks Are All You Need
Paper β’ 2306.11644 β’ Published β’ 152 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88
-
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper β’ 2309.04827 β’ Published β’ 17 -
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper β’ 2403.09629 β’ Published β’ 78
-
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper β’ 2402.13064 β’ Published β’ 50 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper β’ 2309.05463 β’ Published β’ 88 -
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows
Paper β’ 2402.10379 β’ Published β’ 31 -
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
Paper β’ 2312.06585 β’ Published β’ 29