yehya's picture

yehya PRO

ykarout

·

AI & ML interests

None yet

Recent Activity

new activity 3 days ago

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4:CUDA Version -- Min requirement?

new activity 3 days ago

lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF:Inference Settings

commented on an article 4 days ago

🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do

View all activity

Organizations

New activity in nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 3 days ago

CUDA Version -- Min requirement?

#6 opened 5 days ago by

raymondlo84-nvidia

New activity in lmstudio-community/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF 3 days ago

Inference Settings

#1 opened 4 days ago by

commented on 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 4 days ago

@alfredo-ottomate I seriously thought I was missing a huge breakthrough when reading that lol. I mean even the 4 expert active layers wont fit on the claimed 1.5GB RAM and if we even go further and assume there is disk offloading and the Pi had a high-end Gen3 NVME SSD, I would assume sub 1 tok/s.
@SeaWolf-AI you have a nice structured approach for benchmarking and including different kind of variables and metrics but a lot of info in this is flawed honestly. Also Qwen3.5 models underperforming the Qwen3 ones is unexpected at all, are you sure you have used the recommended generation parameters for each model? As slight variations can lead to totally different outputs especially on the metrics you are looking for.

New activity in Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 4 days ago

FP8 Version for running on vLLM with hardware optimizations from Ada+ generation GPUs

#14 opened 5 days ago by

commented on 🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do 5 days ago

gpt-oss-20b on 1.5GB RAM? Which inference framework are you using for that? llama.cpp?

upvoted a paper 9 days ago

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Paper • 2602.24286 • Published 17 days ago • 88

liked a model 10 days ago

kaitchup/Qwen3.5-9B-autoround-NVFP4

2B • Updated 13 days ago • 561 • 1

New activity in AxionML/Qwen3.5-9B-NVFP4 11 days ago

Only producing garbage in H200, cu130 with CUDA 13.0

#1 opened 11 days ago by

updated a model 11 days ago

ykarout/Qwen3.5-9B-NVFP4

Image-Text-to-Text • 8B • Updated 11 days ago • 659

published a model 11 days ago

ykarout/Qwen3.5-9B-NVFP4

Image-Text-to-Text • 8B • Updated 11 days ago • 659

New activity in unsloth/Qwen3.5-35B-A3B-Experiments-GGUF 17 days ago

what's this?

#1 opened 17 days ago by

New activity in unsloth/Qwen3-Coder-Next-GGUF 17 days ago

MXFP4 slower than Q4_K_M

#21 opened 17 days ago by

liked a model 17 days ago

perplexity-ai/pplx-embed-context-v1-4b

Feature Extraction • 4B • Updated 14 days ago • 16.6k • 31

reacted to danielhanchen's post with 🚀 23 days ago

Post

2677

We collabed with HF on showing how you can use HF Jobs and Unsloth! https://huggingface.co/blog/unsloth-jobs

liked a model about 1 month ago

GadflyII/Qwen3-Coder-Next-NVFP4

Text Generation • Updated Feb 4 • 674k • 37

commented a paper about 1 month ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 109 •

upvoted a paper about 1 month ago

Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text

Paper • 2601.22975 • Published Jan 30 • 109

New activity in unsloth/Qwen3-Coder-Next-GGUF about 1 month ago

Check in here for tok/s and benchmarks for local gguf models

#1 opened about 1 month ago by

New activity in lmstudio-community/Qwen3-Coder-Next-GGUF about 1 month ago

Check in here for tok/s and benchmarks for local gguf models 🚀

#2 opened about 1 month ago by

New activity in Qwen/Qwen3-Coder-Next about 1 month ago

Check in here for tok/s and benchmarks for local gguf models

#11 opened about 1 month ago by