Instructions to use google/gemma-2-9b-it with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/gemma-2-9b-it with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="google/gemma-2-9b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it") model = AutoModelForCausalLM.from_pretrained("google/gemma-2-9b-it") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use google/gemma-2-9b-it with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "google/gemma-2-9b-it" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/google/gemma-2-9b-it
- SGLang
How to use google/gemma-2-9b-it with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "google/gemma-2-9b-it" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "google/gemma-2-9b-it", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use google/gemma-2-9b-it with Docker Model Runner:
docker model run hf.co/google/gemma-2-9b-it
Request: Access for '/gemma-2-9b-it' model
Sir, I have requested access for this model 2-3 days before. But still it is pending. Please provide me access to this model.
Hi @rajeevhuggingface87 ,
For now to get instant approval could you please visit the Kaggle page for Gemma and request access to the model weights through Hugging Face. Kindly find this kaggle page reference link.
Thank you.
Sir, I want to do practical through google-colab not from kaggle. So please give access so that I can access these models through google-colab.
I had a similar problem with access to Google Gemma models in Colab, despite having been granted access. I solved it by creating a new HF token with 'write' permissions, and using it to replace the existing Colab secret I had stored.
After granting the notebook permission to access the HF token secret, all variants of the model started downloading to Colab as normal. Strangely, it didn't work with a newly created HF token with 'read only ' permissions 🤔 or is that intentional?
I have generated another token in hugging face with 'Write' access and provided it in both Google Colab and Kaggle Notebook. But a error message in coming in both:
"Your request to access model google/gemma-2-9b-it is awaiting a review from the repo authors".
"Lavanya KV" madam have provided me a link https://www.kaggle.com/models/google/gemma?postConsentAction=download. In this link it is clearly mentioned that "You've consented to the license for Gemma" in Kaggle Environment but still I am unable to access this model in Kaggle Environment too. I think in hugging face Google approval is must. Please note that I have access to GPU accelerator in Google and Kaggle both. There are no resource issues.
Hi @rajeevhuggingface87 ,
As an alternative you can also Request Access to Gemma-2 models on Kaggle. Kindly try and let us know if you have any concerns.
Thank you.
after first logging into docker at the command line...
from https://huggingface.co/docs/hugs/how-to/docker
I then try to run:
export HUGS_CACHE=~/.cache/hugs
mkdir -p "$HUGS_CACHE"
docker run -it --rm
--gpus all
--shm-size=16GB
-v "$HUGS_CACHE:/tmp"
-p 8080:80
'hfhugs/nvidia-google-gemma-2-9b-it:0.2.0'
I did the docker login with no problem...
I already consented to the agreement with Kagle...
but I still get:
"repository does not exist or may require 'docker login"