view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 274
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 3
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 16.3k
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 2
michaelbenayoun/deepseekv3-tiny-4kv-heads-4-layers-random Text Generation • 5.27M • Updated Jul 24, 2025 • 2
michaelbenayoun/granite-tiny-4kv-heads-4layers-random Text Generation • 4.2M • Updated Jun 18, 2025 • 509
michaelbenayoun/granite-tiny-4kv-heads-4layers-random Text Generation • 4.2M • Updated Jun 18, 2025 • 509
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 6.61M • Updated Oct 30, 2025 • 3
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 5.47M • Updated Oct 30, 2025 • 16.3k
michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random Text Generation • 8.54M • Updated Jun 2, 2025 • 66.5k