Instructions to use pcuenq/nvidia-nano-clone with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use pcuenq/nvidia-nano-clone with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="pcuenq/nvidia-nano-clone", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("pcuenq/nvidia-nano-clone", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use pcuenq/nvidia-nano-clone with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "pcuenq/nvidia-nano-clone"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/nvidia-nano-clone",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/pcuenq/nvidia-nano-clone

SGLang

How to use pcuenq/nvidia-nano-clone with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "pcuenq/nvidia-nano-clone" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/nvidia-nano-clone",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "pcuenq/nvidia-nano-clone" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "pcuenq/nvidia-nano-clone",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use pcuenq/nvidia-nano-clone with Docker Model Runner:
```
docker model run hf.co/pcuenq/nvidia-nano-clone
```

nvidia-nano-clone / bias.md

pcuenq HF Staff

Upload folder using huggingface_hub

493df70 verified 7 months ago

preview code

raw

history blame contribute delete

2.89 kB

Field	Response
Participation considerations from adversely impacted groups protected classes in model design and testing:	None
Bias Metric (If Measured):	BBQ Accuracy Scores in Ambiguous Contexts
Which characteristic (feature) show(s) the greatest difference in performance?:	The model shows high variance across many characteristics when used at a high temperature, with the greatest measurable difference seen in categories such as Gender Identity and Race x Gender.
Which feature(s) have the worst performance overall?	Age (ambiguous) has both the lowest category accuracy listed (0.75) and a notably negative bias score (–0.56), indicating it is the worst-performing feature overall in this evaluation.
Measures taken to mitigate against unwanted bias:	None
If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data:	The training datasets contain a large amount of synthetic data generated by LLMs. We manually curated prompts.
Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models:	Bias Benchmark for Question Answering (BBQ)
Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models:	The datasets, which include video datasets (e.g., YouCook2, VCG Human Dataset) and image captioning datasets, do not collectively or exhaustively represent all demographic groups (and proportionally therein).
For instance, these datasets do not contain explicit mentions of demographic classes such as age, gender, or ethnicity in over 80% of samples. In the subset where analysis was performed, certain datasets contain skews in the representation of participants—for example, perceived gender of "female" participants may be significant compared to "male" participants for certain datasets. Separately, individuals aged "40 to 49 years" and “20 to 29 years” are the most frequent among ethnic identifiers. Toxicity analysis was additionally performed on several datasets to identify potential not-safe-for-work samples and risks.
To mitigate these imbalances, we recommend considering evaluation techniques such as bias audits, fine-tuning with demographically balanced datasets, and mitigation strategies like counterfactual data augmentation to align with the desired model behavior. This evaluation was conducted on a data subset ranging from 200 to 3,000 samples per dataset; as such, certain limitations may exist in the reliability of the embeddings. A baseline of 200 samples was used across all datasets, with larger subsets of up to 3,000 samples utilized for certain in-depth analyses.