Speculation head checkpoints
Important
This repo's checkpoints are v11 (default branch /
main). Older v10 checkpoints remain on thev10branch. Use the matching code release: v11 checkpoints requirespeculative_pipeline_decodingv11; v10 checkpoints load viaold_version_v10/.
Pre-trained pipeline speculation head weights. Each .pt file is a single checkpoint produced by training; pair it with the same base model architecture it was trained on (see config["base_model_path"] inside the file).
For inference, evaluation, and training examples, see the official repo:
https://github.com/yuyijiong/speculative_pipeline_decoding
Filename format
Files are named:
{model}_s{num_stages}_l{num_spec_layers}.pt
| Part | Meaning |
|---|---|
{model} |
Base model tag from training config (e.g. Qwen3.5-4B, Qwen3.5-9B) |
s{...} |
num_stages — pipeline depth (number of target-model stages) |
l{...} |
num_spec_layers — number of Transformer layers in the speculation module |
Example: Qwen3.5-9B_s16_l2.pt → Qwen3.5-9B base, 16 stages, 2 spec layers.
Checkpoint contents
Each file is a PyTorch archive with two top-level keys:
{
"state_dict": ..., # weights of the speculation module
"config": { ... }, # hyperparameters and metadata
}
config fields (always present)
| Field | Description |
|---|---|
base_model_path |
Base model path recorded at training time (often a machine-local path; override at load time — see below) |
hidden_size |
Hidden size (matches base model) |
vocab_size |
Base model vocabulary size |
draft_vocab_size |
Draft head output size (full vocab or draft subset) |
num_stages |
Pipeline depth (same as s in filename) |
num_spec_layers |
Speculation module depth (same as l in filename) |
version |
Checkpoint format version (11) |
num_aggr_types |
Number of aggregation types m in the speculation module, determining the number of FC modules |
aggr_feature_bound |
HF hidden-state layer indices for aggregation anchors g_0..g_{m-1} (replaces v10's shallow_hidden_layer_indices) |
trained_with_use_deepest |
Whether training used deepest-layer features |
config fields (optional)
| Field | Description |
|---|---|
model_type |
Base model type recorded at training time (e.g. qwen3_5) |
spec_init_from_base_layers |
Base layers used to initialize the spec module (if any) |
draft_token_ids |
Draft vocabulary token ids (only when trained with a draft vocab subset) |
Available v11 checkpoints
| Base model | s (stages) |
l (spec layers) |
Filename |
|---|---|---|---|
| Qwen3.5-4B | 4 | 4 | Qwen3.5-4B_s4_l4.pt |
| Qwen3.5-4B | 8 | 4 | Qwen3.5-4B_s8_l4.pt |
| Qwen3.5-4B | 16 | 2 | Qwen3.5-4B_s16_l2.pt |
| Qwen3.5-9B | 4 | 4 | Qwen3.5-9B_s4_l4.pt |
| Qwen3.5-9B | 8 | 4 | Qwen3.5-9B_s8_l4.pt |
| Qwen3.5-9B | 16 | 2 | Qwen3.5-9B_s16_l2.pt |
Loading checkpoints
config["base_model_path"] is often a local path from the training machine (e.g. /share/models/Qwen3.5-4B). On your machine, pass the correct Hugging Face id or local directory via --base_model_path; it overrides the path stored in the checkpoint:
python pipeline_inference.py \
--spec_head_ckpt /path/to/Qwen3.5-4B_s4_l4.pt \
--base_model_path Qwen/Qwen3.5-4B
python eval.py \
--spec_head_ckpt /path/to/Qwen3.5-4B_s4_l4.pt \
--base_model_path /your/local/Qwen3.5-4B \
--data_dir eval_data \
--output_dir ./eval_output
If --base_model_path is omitted, the value from config["base_model_path"] is used as-is.
More usage details: speculative_pipeline_decoding.
Citation
If you use this repo, please cite our paper:
@misc{yu2026speculativepipelinedecodinghigheraccruacy,
title={Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism},
author={Yijiong Yu and Huazheng Wang and Shuai Yuan and Ruilong Ren and Ji Pei},
year={2026},
eprint={2605.30852},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.30852},
}