view post Post 4273 OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗 merterbak/gpt-oss-20b-demo See translation
view post Post 4560 Qwen 3 technical report released🚀Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf See translation
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 108 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 90 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 8.32M • • 939 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 3.33M • • 374 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 3.85M • • 512 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 4.16M • • 838
Papers Attention Is All You Need Paper • 1706.03762 • Published Jun 12, 2017 • 108 LoRA Learns Less and Forgets Less Paper • 2405.09673 • Published May 15, 2024 • 90 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51 RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper • 2401.02954 • Published Jan 5, 2024 • 51
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
Qwen 3 Alibaba's Qwen 3 models Qwen/Qwen3-0.6B Text Generation • 0.8B • Updated Jul 26, 2025 • 8.32M • • 939 Qwen/Qwen3-1.7B Text Generation • 2B • Updated Jul 26, 2025 • 3.33M • • 374 Qwen/Qwen3-4B Text Generation • 4B • Updated Jul 26, 2025 • 3.85M • • 512 Qwen/Qwen3-8B Text Generation • 8B • Updated Jul 26, 2025 • 4.16M • • 838
Running on Zero 6 Seed Coder 8B Instruct 🚀 ByteDance Seed's coding focused Seed-Coder-8B-Instruct model
merterbak/Mistral-Small-3.1-24B-Instruct-2503-GGUF Text Generation • 24B • Updated Apr 27, 2025 • 126 • 1