HiCI: Hierarchical Construction-Integration for Long-Context Attention
Paper β’ 2603.20843 β’ Published
This is a HiCI adapter checkpoint for Llama-2-7B, extending its context window to 16K tokens. It contains three components: LoRA adapters (q/k/v/o_proj), HiCI module weights (LocalConstructor + GlobalIntegrator), and fine-tuned embedding + LayerNorm weights.
Paper: HiCI (arXiv 2603.20843)
Three-stage hierarchy per transformer layer:
Input (16K tokens) β 4 segments Γ 4K
Stage 1: 8 local slots per segment β L_i
Stage 2: multi-view stats β K=4 global slots G
Stage 3: Q=[chunk], KV=[G, L_i, chunk] β Flash Attention
adapter_model.bin (27 MB)
βββ LoRA Adapters (r=8, alpha=16): q_proj, k_proj, v_proj, o_proj
trainable_params.bin (~2 GB)
βββ local_constructor.* β Local Construction modules (32 layers)
βββ global_integrator.* β Global Integration modules (32 layers)
βββ input_layernorm / post_attention_layernorm β LayerNorm weights (32 layers)
βββ model.embed_tokens.weight β Token embeddings
βββ model.norm.weight β Final LayerNorm
Requires llama_attn_hici.py from this repo.
import torch
import transformers
from peft import PeftModel
import llama_attn_hici as hici_attn
# 1. Replace attention with HiCI BEFORE loading model
hici_attn.MIXED_GROUP_TRAINING = False
hici_attn.replace_llama_attn(use_flash_attn=True, use_full=False, use_hierarchical_forward=True)
# 2. Load base model
base_model = transformers.AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-2-7b-hf", torch_dtype=torch.bfloat16, device_map="auto",
)
# 3. Register HiCI modules (must match training config)
hici_attn.register_hici_to_model(base_model, num_memory_slots=8, global_slots=4, num_heads=8, bottleneck_dim=512)
# 4. Load LoRA adapter + trainable_params
model = PeftModel.from_pretrained(base_model, "ZengXiangyu/Llama-2-7b-HiCI-16k")
# 5. Tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained("ZengXiangyu/Llama-2-7b-HiCI-16k")
@article{zeng2026hici,
title={HiCI: Hierarchical Construction-Integration for Long-Context Attention},
author={Zeng, Xiangyu and Xu, Qi and Wang, Yunke and Xu, Chang},
journal={arXiv preprint arXiv:2603.20843},
year={2026}
}
This model follows the Llama 2 Community License.
Base model
meta-llama/Llama-2-7b-hf