Any plan to upload GGUF models?

#11

by hdnh2006 - opened Nov 24, 2025

hdnh2006

Nov 24, 2025

Hey! thanks for this release.

At this moment llama.cpp is not able to convert the models into gguf, it looks like your architecture is still not supported. Is there any plan to release the models in this quantization?

python3 convert_hf_to_gguf.py models/nvidia/llama-embed-nemotron-8b/ --outfile llama-embed-f16.gguf
INFO:hf-to-gguf:Loading model: llama-embed-nemotron-8b
INFO:hf-to-gguf:Model architecture: LlamaBidirectionalModel
ERROR:hf-to-gguf:Model LlamaBidirectionalModel is not supported

Thanks in advance

ybabakhin

NVIDIA org Nov 26, 2025

Hi @hdnh2006 !

No, we don't have plans to provide quantized models for this release.

ybabakhin changed discussion status to closed Nov 26, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment