Feature Extraction
sentence-transformers
Safetensors
Transformers
multilingual
llama_bidirec
text
sentence-similarity
mteb
mmteb
custom_code
text-embeddings-inference
Instructions to use nvidia/llama-embed-nemotron-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use nvidia/llama-embed-nemotron-8b with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("nvidia/llama-embed-nemotron-8b", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use nvidia/llama-embed-nemotron-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="nvidia/llama-embed-nemotron-8b", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/llama-embed-nemotron-8b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Any plan to upload GGUF models?
#11
by hdnh2006 - opened
Hey! thanks for this release.
At this moment llama.cpp is not able to convert the models into gguf, it looks like your architecture is still not supported. Is there any plan to release the models in this quantization?
python3 convert_hf_to_gguf.py models/nvidia/llama-embed-nemotron-8b/ --outfile llama-embed-f16.gguf
INFO:hf-to-gguf:Loading model: llama-embed-nemotron-8b
INFO:hf-to-gguf:Model architecture: LlamaBidirectionalModel
ERROR:hf-to-gguf:Model LlamaBidirectionalModel is not supported
Thanks in advance
ybabakhin changed discussion status to closed