Speculation head checkpoints

Important

This repo's checkpoints are v11 (default branch / main). Older v10 checkpoints remain on the v10 branch. Use the matching code release: v11 checkpoints require speculative_pipeline_decoding v11; v10 checkpoints load via old_version_v10/.

Pre-trained pipeline speculation head weights. Each .pt file is a single checkpoint produced by training; pair it with the same base model architecture it was trained on (see config["base_model_path"] inside the file).

For inference, evaluation, and training examples, see the official repo:
https://github.com/yuyijiong/speculative_pipeline_decoding

Filename format

Files are named:

{model}_s{num_stages}_l{num_spec_layers}.pt

Part	Meaning
`{model}`	Base model tag from training config (e.g. `Qwen3.5-4B`, `Qwen3.5-9B`)
`s{...}`	`num_stages` — pipeline depth (number of target-model stages)
`l{...}`	`num_spec_layers` — number of Transformer layers in the speculation module

Example: Qwen3.5-9B_s16_l2.pt → Qwen3.5-9B base, 16 stages, 2 spec layers.

Checkpoint contents

Each file is a PyTorch archive with two top-level keys:

{
    "state_dict": ...,  # weights of the speculation module
    "config": { ... },  # hyperparameters and metadata
}

`config` fields (always present)

Field	Description
`base_model_path`	Base model path recorded at training time (often a machine-local path; override at load time — see below)
`hidden_size`	Hidden size (matches base model)
`vocab_size`	Base model vocabulary size
`draft_vocab_size`	Draft head output size (full vocab or draft subset)
`num_stages`	Pipeline depth (same as `s` in filename)
`num_spec_layers`	Speculation module depth (same as `l` in filename)
`version`	Checkpoint format version (`11`)
`num_aggr_types`	Number of aggregation types `m` in the speculation module, determining the number of FC modules
`aggr_feature_bound`	HF hidden-state layer indices for aggregation anchors `g_0..g_{m-1}` (replaces v10's `shallow_hidden_layer_indices`)
`trained_with_use_deepest`	Whether training used deepest-layer features

`config` fields (optional)

Field	Description
`model_type`	Base model type recorded at training time (e.g. `qwen3_5`)
`spec_init_from_base_layers`	Base layers used to initialize the spec module (if any)
`draft_token_ids`	Draft vocabulary token ids (only when trained with a draft vocab subset)

Available v11 checkpoints

Base model	`s` (stages)	`l` (spec layers)	Filename
Qwen3.5-4B	4	4	`Qwen3.5-4B_s4_l4.pt`
Qwen3.5-4B	8	4	`Qwen3.5-4B_s8_l4.pt`
Qwen3.5-4B	16	2	`Qwen3.5-4B_s16_l2.pt`
Qwen3.5-9B	4	4	`Qwen3.5-9B_s4_l4.pt`
Qwen3.5-9B	8	4	`Qwen3.5-9B_s8_l4.pt`
Qwen3.5-9B	16	2	`Qwen3.5-9B_s16_l2.pt`

Loading checkpoints

config["base_model_path"] is often a local path from the training machine (e.g. /share/models/Qwen3.5-4B). On your machine, pass the correct Hugging Face id or local directory via --base_model_path; it overrides the path stored in the checkpoint:

python pipeline_inference.py \
  --spec_head_ckpt /path/to/Qwen3.5-4B_s4_l4.pt \
  --base_model_path Qwen/Qwen3.5-4B

python eval.py \
  --spec_head_ckpt /path/to/Qwen3.5-4B_s4_l4.pt \
  --base_model_path /your/local/Qwen3.5-4B \
  --data_dir eval_data \
  --output_dir ./eval_output

If --base_model_path is omitted, the value from config["base_model_path"] is used as-is.

More usage details: speculative_pipeline_decoding.

Citation

If you use this repo, please cite our paper:

@misc{yu2026speculativepipelinedecodinghigheraccruacy,
      title={Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism}, 
      author={Yijiong Yu and Huazheng Wang and Shuai Yuan and Ruilong Ren and Ji Pei},
      year={2026},
      eprint={2605.30852},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.30852}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for yuyijiong/speculative_pipeline_decoding

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Finetuned

(326)

this model

Datasets used to train yuyijiong/speculative_pipeline_decoding

Paper for yuyijiong/speculative_pipeline_decoding

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

Paper • 2605.30852 • Published 28 days ago • 10