Instructions to use ctu-aic/Llama-3.1-8B_cp-cs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ctu-aic/Llama-3.1-8B_cp-cs with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ctu-aic/Llama-3.1-8B_cp-cs")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ctu-aic/Llama-3.1-8B_cp-cs")
model = AutoModelForCausalLM.from_pretrained("ctu-aic/Llama-3.1-8B_cp-cs")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ctu-aic/Llama-3.1-8B_cp-cs with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ctu-aic/Llama-3.1-8B_cp-cs"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B_cp-cs",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/ctu-aic/Llama-3.1-8B_cp-cs

SGLang

How to use ctu-aic/Llama-3.1-8B_cp-cs with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ctu-aic/Llama-3.1-8B_cp-cs" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B_cp-cs",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ctu-aic/Llama-3.1-8B_cp-cs" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ctu-aic/Llama-3.1-8B_cp-cs",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use ctu-aic/Llama-3.1-8B_cp-cs with Docker Model Runner:
```
docker model run hf.co/ctu-aic/Llama-3.1-8B_cp-cs
```

Model Card for Llama 3.1 8B -> CP_(cs)

Llama 3.1 8B continuously pretrained on the Czech subset of FineWeb2. More information in the thesis: TBA. (The notation is thesis is: B->CP_(cs))

🛑 Ethical Considerations and Limitations

This model is a Czech-adapted version of Meta's LLaMA 3.1 8B, developed as part of master's thesis. It is intended solely for academic and research purposes.

⚠️ Not Intended for Production Use: This model has not undergone extensive safety testing, fine-tuning for alignment, or robust filtering of harmful outputs. Do not deploy this model in any application or setting that impacts users or the public.
❗ Potential for Harm: The model may generate biased, offensive, false, or otherwise harmful content. It does not include safeguards such as moderation layers or toxicity detection.
🧪 Experimental Nature: This model is an academic experiment accompanying a thesis project and may contain unintended behaviors or limitations due to limited training data, resources, or evaluation.
👤 Responsibility: Any use of this model is at the user’s own risk. The author does not assume responsibility for any consequences arising from the use of the model.
🔒 Respect for Original License: This adaptation is subject to the original terms and conditions set by Meta for LLaMA models.

Researchers and practitioners using this model must ensure appropriate ethical oversight and conduct rigorous evaluations before any further deployment or fine-tuning.

Citation

@mastersthesis{mlynar2025llmadapt,
  author  = {Tomáš Mlynář},
  title   = {Compute-constrained LLM adaptation to Czech language},
  school  = {Czech Technical University in Prague},
  year    = {2025},
  type    = {Master's thesis},
  month   = {6},
  note    = {Supervisor: Ing. Herbert Ullrich},
  url     = {http://hdl.handle.net/10467/123587}
}