@alfredo-ottomate I seriously thought I was missing a huge breakthrough when reading that lol. I mean even the 4 expert active layers wont fit on the claimed 1.5GB RAM and if we even go further and assume there is disk offloading and the Pi had a high-end Gen3 NVME SSD, I would assume sub 1 tok/s.
@SeaWolf-AI you have a nice structured approach for benchmarking and including different kind of variables and metrics but a lot of info in this is flawed honestly. Also Qwen3.5 models underperforming the Qwen3 ones is unexpected at all, are you sure you have used the recommended generation parameters for each model? As slight variations can lead to totally different outputs especially on the metrics you are looking for.
yehya PRO
ykarout
AI & ML interests
None yet
Recent Activity
new activity
3 days ago
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4:CUDA Version -- Min requirement? new activity
3 days ago
lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:Inference Settings commented on an article 4 days ago
🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do Organizations
CUDA Version -- Min requirement?
🤗 👀 2
1
#6 opened 5 days ago
by
raymondlo84-nvidia
Inference Settings
1
#1 opened 4 days ago
by
MedicSaver
commented on 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 4 days ago
FP8 Version for running on vLLM with hardware optimizations from Ada+ generation GPUs
4
#14 opened 5 days ago
by
AQLabs
commented on 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 5 days ago
gpt-oss-20b on 1.5GB RAM? Which inference framework are you using for that? llama.cpp?
upvoted a paper 9 days ago
Only producing garbage in H200, cu130 with CUDA 13.0
➕ 2
1
#1 opened 11 days ago
by
Dsturb
what's this?
11
#1 opened 17 days ago
by
Simon716
MXFP4 slower than Q4_K_M
#21 opened 17 days ago
by
ykarout
reacted to danielhanchen's post with 🚀 23 days ago
Post
2677
We collabed with HF on showing how you can use HF Jobs and Unsloth! https://huggingface.co/blog/unsloth-jobs
upvoted a paper about 1 month ago
Check in here for tok/s and benchmarks for local gguf models
👍 1
5
#1 opened about 1 month ago
by
ykarout
Check in here for tok/s and benchmarks for local gguf models 🚀
🔥 1
#2 opened about 1 month ago
by
ykarout
Check in here for tok/s and benchmarks for local gguf models
🔥 4
#11 opened about 1 month ago
by
ykarout