Commit ·
57fc5bc
0
Parent(s):
initial commit
Browse files- .gitattributes +38 -0
- README.md +216 -0
- images/perplexity.png +3 -0
- logs/imatrix-Qwen3-Coder-Next-BF16.log +0 -0
- logs/perplexity-Qwen3-Coder-Next-IQ4_KSS.log +174 -0
- logs/perplexity-Qwen3-Coder-Next-Q8_0.log +187 -0
- logs/perplexity-Qwen3-Coder-Next-smol-IQ2_KS.log +174 -0
- logs/quantize-Qwen3-Coder-Next-IQ4_KSS.log +0 -0
- logs/quantize-Qwen3-Coder-Next-Q8_0.log +0 -0
- logs/quantize-Qwen3-Coder-Next-smol-IQ2_KS.log +0 -0
.gitattributes
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,216 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
quantized_by: ubergarm
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
base_model: Qwen/Qwen3-Coder-Next
|
| 5 |
+
base_model_relation: quantized
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
tags:
|
| 8 |
+
- imatrix
|
| 9 |
+
- conversational
|
| 10 |
+
- qwen3_next
|
| 11 |
+
- ik_llama.cpp
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## `ik_llama.cpp` imatrix Quantizations of Qwen/Qwen3-Coder-Next
|
| 15 |
+
*NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
|
| 16 |
+
|
| 17 |
+
Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds. Also check for [ik_llama.cpp windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases).
|
| 18 |
+
|
| 19 |
+
These quants provide best in class perplexity for the given memory footprint.
|
| 20 |
+
|
| 21 |
+
## Big Thanks
|
| 22 |
+
Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
|
| 23 |
+
|
| 24 |
+
Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://huggingface.co/BeaverAI) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models! Thanks to huggingface for hosting all these big quants!
|
| 25 |
+
|
| 26 |
+
Finally, I *really* appreciate the support from [aifoundry.org](https://aifoundry.org) so check out their open source RISC-V based solutions!
|
| 27 |
+
|
| 28 |
+
## Quant Collection
|
| 29 |
+
Perplexity computed against *wiki.test.raw*. (lower is "better")
|
| 30 |
+
|
| 31 |
+

|
| 32 |
+
|
| 33 |
+
These two are just test quants for baseline perplexity comparison and not available for download here:
|
| 34 |
+
* `BF16` 148.502 GiB (16.010 BPW)
|
| 35 |
+
- TODO
|
| 36 |
+
* `Q8_0` 78.982 GiB (8.515 BPW)
|
| 37 |
+
- PPL over 584 chunks for n_ctx=512 = 8.2239 +/- 0.06389
|
| 38 |
+
|
| 39 |
+
*NOTE*: The first split file is much smaller on purpose to only contain metadata, its fine!
|
| 40 |
+
|
| 41 |
+
## IQ4_KSS 39.377 GiB (4.245 BPW)
|
| 42 |
+
PPL over 584 chunks for n_ctx=512 = 8.3069 +/- 0.06459
|
| 43 |
+
|
| 44 |
+
<details>
|
| 45 |
+
|
| 46 |
+
<summary>👈 Secret Recipe</summary>
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
#!/usr/bin/env bash
|
| 50 |
+
|
| 51 |
+
custom="
|
| 52 |
+
# 60 Repeating Layers [0-59]
|
| 53 |
+
|
| 54 |
+
## Gated Attention/Delta Net [Blended 0-59]
|
| 55 |
+
blk\..*\.attn_gate\.weight=q8_0
|
| 56 |
+
blk\..*\.attn_qkv\.weight=q8_0
|
| 57 |
+
blk\..*\.attn_output\.weight=q8_0
|
| 58 |
+
blk\..*\.attn_q\.weight=q8_0
|
| 59 |
+
blk\..*\.attn_k\.weight=q8_0
|
| 60 |
+
blk\..*\.attn_v\.weight=q8_0
|
| 61 |
+
blk\..*\.ssm_ba\.weight=q8_0
|
| 62 |
+
blk\..*\.ssm_out\.weight=q8_0
|
| 63 |
+
|
| 64 |
+
# Shared Expert Layers [0-59]
|
| 65 |
+
blk\..*\.ffn_down_shexp\.weight=q8_0
|
| 66 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
|
| 67 |
+
|
| 68 |
+
# Routed Experts Layers [0-59]
|
| 69 |
+
blk\..*\.ffn_down_exps\.weight=iq4_ks
|
| 70 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
|
| 71 |
+
|
| 72 |
+
# Non-Repeating Layers
|
| 73 |
+
token_embd\.weight=iq6_k
|
| 74 |
+
output\.weight=iq6_k
|
| 75 |
+
"
|
| 76 |
+
|
| 77 |
+
custom=$(
|
| 78 |
+
echo "$custom" | grep -v '^#' | \
|
| 79 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 80 |
+
)
|
| 81 |
+
|
| 82 |
+
numactl -N ${SOCKET} -m ${SOCKET} \
|
| 83 |
+
./build/bin/llama-quantize \
|
| 84 |
+
--dry-run \
|
| 85 |
+
--custom-q "$custom" \
|
| 86 |
+
--imatrix /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/imatrix-Qwen3-Coder-Next-BF16.dat \
|
| 87 |
+
/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf \
|
| 88 |
+
/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf \
|
| 89 |
+
IQ4_KSS \
|
| 90 |
+
128
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
</details>
|
| 94 |
+
|
| 95 |
+
## smol-IQ2_KS 22.097 GiB (2.382 BPW)
|
| 96 |
+
PPL over 584 chunks for n_ctx=512 = 9.4488 +/- 0.07565
|
| 97 |
+
|
| 98 |
+
<details>
|
| 99 |
+
|
| 100 |
+
<summary>👈 Secret Recipe</summary>
|
| 101 |
+
|
| 102 |
+
```bash
|
| 103 |
+
#!/usr/bin/env bash
|
| 104 |
+
|
| 105 |
+
custom="
|
| 106 |
+
# 60 Repeating Layers [0-59]
|
| 107 |
+
|
| 108 |
+
## Gated Attention/Delta Net [Blended 0-59]
|
| 109 |
+
blk\..*\.attn_gate\.weight=q8_0
|
| 110 |
+
blk\..*\.attn_qkv\.weight=q8_0
|
| 111 |
+
blk\..*\.attn_output\.weight=q8_0
|
| 112 |
+
blk\..*\.attn_q\.weight=q8_0
|
| 113 |
+
blk\..*\.attn_k\.weight=q8_0
|
| 114 |
+
blk\..*\.attn_v\.weight=q8_0
|
| 115 |
+
blk\..*\.ssm_ba\.weight=q8_0
|
| 116 |
+
blk\..*\.ssm_out\.weight=q8_0
|
| 117 |
+
|
| 118 |
+
# Shared Expert Layers [0-59]
|
| 119 |
+
blk\..*\.ffn_down_shexp\.weight=q8_0
|
| 120 |
+
blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
|
| 121 |
+
|
| 122 |
+
# Routed Experts Layers [0-59]
|
| 123 |
+
blk\..*\.ffn_down_exps\.weight=iq2_ks
|
| 124 |
+
blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
|
| 125 |
+
|
| 126 |
+
# Non-Repeating Layers
|
| 127 |
+
token_embd\.weight=iq4_k
|
| 128 |
+
output\.weight=iq6_k
|
| 129 |
+
"
|
| 130 |
+
|
| 131 |
+
custom=$(
|
| 132 |
+
echo "$custom" | grep -v '^#' | \
|
| 133 |
+
sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
|
| 134 |
+
)
|
| 135 |
+
|
| 136 |
+
numactl -N ${SOCKET} -m ${SOCKET} \
|
| 137 |
+
./build/bin/llama-quantize \
|
| 138 |
+
--dry-run \
|
| 139 |
+
--custom-q "$custom" \
|
| 140 |
+
--imatrix /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/imatrix-Qwen3-Coder-Next-BF16.dat \
|
| 141 |
+
/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf \
|
| 142 |
+
/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf \
|
| 143 |
+
IQ2_KS \
|
| 144 |
+
128
|
| 145 |
+
```
|
| 146 |
+
|
| 147 |
+
</details>
|
| 148 |
+
|
| 149 |
+
## Quick Start
|
| 150 |
+
Check some recent model cards for examples on running models.
|
| 151 |
+
|
| 152 |
+
```bash
|
| 153 |
+
# Clone and checkout
|
| 154 |
+
$ git clone https://github.com/ikawrakow/ik_llama.cpp
|
| 155 |
+
$ cd ik_llama.cpp
|
| 156 |
+
|
| 157 |
+
# Build for hybrid CPU+CUDA
|
| 158 |
+
$ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
|
| 159 |
+
$ cmake --build build --config Release -j $(nproc)
|
| 160 |
+
|
| 161 |
+
# Download Desired Quants
|
| 162 |
+
$ pip install huggingface_hub
|
| 163 |
+
$ hf download --local-dir ./ --include=smol-IQ2_XS/*.gguf ubergarm/Qwen3-Coder-Next-GGUF
|
| 164 |
+
|
| 165 |
+
# Full GPU offload
|
| 166 |
+
# For 2 or more GPUs keep an eye on `-sm graph` support:
|
| 167 |
+
# https://github.com/ikawrakow/ik_llama.cpp/pull/1292
|
| 168 |
+
CUDA_VISIBLE_DEVICES="0,1" \
|
| 169 |
+
./build/bin/llama-server \
|
| 170 |
+
--model "$model" \
|
| 171 |
+
--alias Qwen3-Coder-Next \
|
| 172 |
+
-c 262144 \
|
| 173 |
+
-fa on \
|
| 174 |
+
-ger \
|
| 175 |
+
--merge-qkv \
|
| 176 |
+
-sm graph \
|
| 177 |
+
-ngl 99 \
|
| 178 |
+
-ub 2048 -b 2048 \
|
| 179 |
+
--threads 1 \
|
| 180 |
+
--host 127.0.0.1 \
|
| 181 |
+
--port 8080 \
|
| 182 |
+
--jinja \
|
| 183 |
+
--no-mmap
|
| 184 |
+
|
| 185 |
+
# Hybrid CPU+GPU
|
| 186 |
+
# basically use --n-cpu-moe etc...
|
| 187 |
+
echo TODO
|
| 188 |
+
|
| 189 |
+
# CPU-Only
|
| 190 |
+
# Gated delta net CPU-only performance seems slower than other architechtures, ideally have at least 1x GPU for attn/kv-cache
|
| 191 |
+
numactl -N "$SOCKET" -m "$SOCKET" \
|
| 192 |
+
./build/bin/llama-server \
|
| 193 |
+
--model "$model"\
|
| 194 |
+
--alias Qwen3-Coder-Next \
|
| 195 |
+
--ctx-size 131072 \
|
| 196 |
+
-ger \
|
| 197 |
+
--merge-qkv \
|
| 198 |
+
-ctk q8_0 -ctv q8_0 \
|
| 199 |
+
-ub 4096 -b 4096 \
|
| 200 |
+
--parallel 1 \
|
| 201 |
+
--threads 96 \
|
| 202 |
+
--threads-batch 128 \
|
| 203 |
+
--numa numactl \
|
| 204 |
+
--host 127.0.0.1 \
|
| 205 |
+
--port 8080 \
|
| 206 |
+
--no-mmap \
|
| 207 |
+
--jinja
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
## References
|
| 211 |
+
* [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
|
| 212 |
+
* [ubergarm on quantizing LLMs and tuning GPUs with aifoundry.org](https://blog.aifoundry.org/p/adventures-in-model-quantization)
|
| 213 |
+
* [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
|
| 214 |
+
* [Getting Started Guide (out of date)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
|
| 215 |
+
* [Quant Cookers Guide (out of date)](https://github.com/ikawrakow/ik_llama.cpp/discussions/434)
|
| 216 |
+
* [ik_llama.cpp Qwen3Next Issue](https://github.com/ikawrakow/ik_llama.cpp/issues/1229)
|
images/perplexity.png
ADDED
|
Git LFS Details
|
logs/imatrix-Qwen3-Coder-Next-BF16.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
logs/perplexity-Qwen3-Coder-Next-IQ4_KSS.log
ADDED
|
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
SOCKET is set to: 0
|
| 2 |
+
main: build = 4211 (b2cb4512)
|
| 3 |
+
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
| 4 |
+
main: seed = 1337
|
| 5 |
+
CPU: using device CPU - 0 MiB free
|
| 6 |
+
llama_model_loader: loaded meta data with 47 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf (version GGUF V3 (latest))
|
| 7 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
| 8 |
+
llama_model_loader: - kv 0: general.architecture str = qwen3next
|
| 9 |
+
llama_model_loader: - kv 1: general.type str = model
|
| 10 |
+
llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
|
| 11 |
+
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
|
| 12 |
+
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
|
| 13 |
+
llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
|
| 14 |
+
llama_model_loader: - kv 6: general.size_label str = 512x2.5B
|
| 15 |
+
llama_model_loader: - kv 7: general.license str = apache-2.0
|
| 16 |
+
llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
|
| 17 |
+
llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
|
| 18 |
+
llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
|
| 19 |
+
llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
|
| 20 |
+
llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
|
| 21 |
+
llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
|
| 22 |
+
llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
|
| 23 |
+
llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
|
| 24 |
+
llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
|
| 25 |
+
llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
|
| 26 |
+
llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
|
| 27 |
+
llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
|
| 28 |
+
llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
|
| 29 |
+
llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
|
| 30 |
+
llama_model_loader: - kv 22: general.file_type u32 = 148
|
| 31 |
+
llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
|
| 32 |
+
llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
|
| 33 |
+
llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
|
| 34 |
+
llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
|
| 35 |
+
llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
|
| 36 |
+
llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
|
| 37 |
+
llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
|
| 38 |
+
llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
|
| 39 |
+
llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
|
| 40 |
+
llama_model_loader: - kv 32: general.quantization_version u32 = 2
|
| 41 |
+
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
|
| 42 |
+
llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
|
| 43 |
+
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
| 44 |
+
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
| 45 |
+
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
|
| 46 |
+
llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
|
| 47 |
+
llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
|
| 48 |
+
llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
|
| 49 |
+
llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
|
| 50 |
+
llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
|
| 51 |
+
llama_model_loader: - kv 43: quantize.imatrix.file str = /mnt/data/models/ubergarm/Qwen3-Coder...
|
| 52 |
+
llama_model_loader: - kv 44: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
|
| 53 |
+
llama_model_loader: - kv 45: quantize.imatrix.entries_count i32 = 577
|
| 54 |
+
llama_model_loader: - kv 46: quantize.imatrix.chunks_count i32 = 840
|
| 55 |
+
llama_model_loader: - type f32: 361 tensors
|
| 56 |
+
llama_model_loader: - type q8_0: 336 tensors
|
| 57 |
+
llama_model_loader: - type iq6_k: 2 tensors
|
| 58 |
+
llama_model_loader: - type iq4_ks: 48 tensors
|
| 59 |
+
llama_model_loader: - type iq4_kss: 96 tensors
|
| 60 |
+
load: printing all EOG tokens:
|
| 61 |
+
load: - 151643 ('<|endoftext|>')
|
| 62 |
+
load: - 151645 ('<|im_end|>')
|
| 63 |
+
load: - 151662 ('<|fim_pad|>')
|
| 64 |
+
load: - 151663 ('<|repo_name|>')
|
| 65 |
+
load: - 151664 ('<|file_sep|>')
|
| 66 |
+
load: special tokens cache size = 26
|
| 67 |
+
load: token to piece cache size = 0.9311 MB
|
| 68 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
| 69 |
+
llm_load_print_meta: arch = qwen3next
|
| 70 |
+
llm_load_print_meta: n_ctx_train = 262144
|
| 71 |
+
llm_load_print_meta: n_embd = 2048
|
| 72 |
+
llm_load_print_meta: n_layer = 48
|
| 73 |
+
llm_load_print_meta: n_head = 16
|
| 74 |
+
llm_load_print_meta: n_head_kv = 2
|
| 75 |
+
llm_load_print_meta: n_rot = 64
|
| 76 |
+
llm_load_print_meta: n_swa = 0
|
| 77 |
+
llm_load_print_meta: n_swa_pattern = 1
|
| 78 |
+
llm_load_print_meta: n_embd_head_k = 256
|
| 79 |
+
llm_load_print_meta: n_embd_head_v = 256
|
| 80 |
+
llm_load_print_meta: n_gqa = 8
|
| 81 |
+
llm_load_print_meta: n_embd_k_gqa = 512
|
| 82 |
+
llm_load_print_meta: n_embd_v_gqa = 512
|
| 83 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
| 84 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
| 85 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
| 86 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
| 87 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
| 88 |
+
llm_load_print_meta: n_ff = 5120
|
| 89 |
+
llm_load_print_meta: n_expert = 512
|
| 90 |
+
llm_load_print_meta: n_expert_used = 10
|
| 91 |
+
llm_load_print_meta: causal attn = 1
|
| 92 |
+
llm_load_print_meta: pooling type = 0
|
| 93 |
+
llm_load_print_meta: rope type = 2
|
| 94 |
+
llm_load_print_meta: rope scaling = linear
|
| 95 |
+
llm_load_print_meta: freq_base_train = 5000000.0
|
| 96 |
+
llm_load_print_meta: freq_scale_train = 1
|
| 97 |
+
llm_load_print_meta: n_ctx_orig_yarn = 262144
|
| 98 |
+
llm_load_print_meta: rope_finetuned = unknown
|
| 99 |
+
llm_load_print_meta: ssm_d_conv = 4
|
| 100 |
+
llm_load_print_meta: ssm_d_inner = 4096
|
| 101 |
+
llm_load_print_meta: ssm_d_state = 128
|
| 102 |
+
llm_load_print_meta: ssm_dt_rank = 32
|
| 103 |
+
llm_load_print_meta: model type = 80B.A3B
|
| 104 |
+
llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpw
|
| 105 |
+
llm_load_print_meta: model params = 79.674 B
|
| 106 |
+
llm_load_print_meta: model size = 39.377 GiB (4.245 BPW)
|
| 107 |
+
llm_load_print_meta: repeating layers = 38.897 GiB (4.227 BPW, 79.052 B parameters)
|
| 108 |
+
llm_load_print_meta: general.name = Qwen3 Coder Next
|
| 109 |
+
print_info: vocab type = BPE
|
| 110 |
+
print_info: n_vocab = 151936
|
| 111 |
+
print_info: n_merges = 151387
|
| 112 |
+
print_info: BOS token = 151643 '<|endoftext|>'
|
| 113 |
+
print_info: EOS token = 151645 '<|im_end|>'
|
| 114 |
+
print_info: EOT token = 151645 '<|im_end|>'
|
| 115 |
+
print_info: PAD token = 151643 '<|endoftext|>'
|
| 116 |
+
print_info: LF token = 198 'Ċ'
|
| 117 |
+
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
|
| 118 |
+
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
|
| 119 |
+
print_info: FIM MID token = 151660 '<|fim_middle|>'
|
| 120 |
+
print_info: FIM PAD token = 151662 '<|fim_pad|>'
|
| 121 |
+
print_info: FIM REP token = 151663 '<|repo_name|>'
|
| 122 |
+
print_info: FIM SEP token = 151664 '<|file_sep|>'
|
| 123 |
+
print_info: EOG token = 151643 '<|endoftext|>'
|
| 124 |
+
print_info: EOG token = 151645 '<|im_end|>'
|
| 125 |
+
print_info: EOG token = 151662 '<|fim_pad|>'
|
| 126 |
+
print_info: EOG token = 151663 '<|repo_name|>'
|
| 127 |
+
print_info: EOG token = 151664 '<|file_sep|>'
|
| 128 |
+
print_info: max token length = 256
|
| 129 |
+
llm_load_tensors: ggml ctx size = 0.35 MiB
|
| 130 |
+
llm_load_tensors: offloading 0 repeating layers to GPU
|
| 131 |
+
llm_load_tensors: offloaded 0/49 layers to GPU
|
| 132 |
+
llm_load_tensors: CPU buffer size = 40322.46 MiB
|
| 133 |
+
....................................................................................................
|
| 134 |
+
llama_init_from_model: n_ctx = 2048
|
| 135 |
+
llama_init_from_model: n_batch = 2048
|
| 136 |
+
llama_init_from_model: n_ubatch = 512
|
| 137 |
+
llama_init_from_model: flash_attn = 1
|
| 138 |
+
llama_init_from_model: attn_max_b = 0
|
| 139 |
+
llama_init_from_model: fused_moe = 1
|
| 140 |
+
llama_init_from_model: grouped er = 0
|
| 141 |
+
llama_init_from_model: fused_up_gate = 1
|
| 142 |
+
llama_init_from_model: fused_mmad = 1
|
| 143 |
+
llama_init_from_model: rope_cache = 0
|
| 144 |
+
llama_init_from_model: graph_reuse = 1
|
| 145 |
+
llama_init_from_model: k_cache_hadam = 0
|
| 146 |
+
llama_init_from_model: split_mode_graph_scheduling = 0
|
| 147 |
+
llama_init_from_model: reduce_type = f16
|
| 148 |
+
llama_init_from_model: sched_async = 0
|
| 149 |
+
llama_init_from_model: ser = -1, 0
|
| 150 |
+
llama_init_from_model: freq_base = 5000000.0
|
| 151 |
+
llama_init_from_model: freq_scale = 1
|
| 152 |
+
llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
|
| 153 |
+
llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
|
| 154 |
+
llama_init_from_model: CPU output buffer size = 2.32 MiB
|
| 155 |
+
llama_init_from_model: CPU compute buffer size = 300.75 MiB
|
| 156 |
+
llama_init_from_model: graph nodes = 12382
|
| 157 |
+
llama_init_from_model: graph splits = 1
|
| 158 |
+
llama_init_from_model: enabling only_active_experts scheduling
|
| 159 |
+
|
| 160 |
+
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
|
| 161 |
+
perplexity: tokenizing the input ..
|
| 162 |
+
perplexity: tokenization took 390.895 ms
|
| 163 |
+
perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
| 164 |
+
perplexity: 3.57 seconds per pass - ETA 8.67 minutes
|
| 165 |
+
===================================== llama_init_from_model: f16
|
| 166 |
+
======================================= HAVE_FANCY_SIMD is defined
|
| 167 |
+
[1]4.7708,[2]6.6455,[3]5.6847,[4]4.9300,[5]4.8482,[6]4.9616,[7]5.0382,[8]5.1594,[9]5.1162,[10]5.1752,[11]5.1005,[12]5.3326,[13]5.7391,[14]5.6986,[15]5.8049,[16]6.1672,[17]5.9825,[18]6.1662,[19]6.2517,[20]6.2809,[21]6.1879,[22]6.3014,[23]6.1039,[24]5.8551,[25]5.7720,[26]5.6545,[27]5.5737,[28]5.5205,[29]5.5937,[30]5.5868,[31]5.5772,[32]5.6498,[33]5.5949,[34]5.6422,[35]5.7332,[36]5.8198,[37]5.9514,[38]6.0595,[39]6.0941,[40]6.1921,[41]6.2363,[42]6.2663,[43]6.3373,[44]6.3414,[45]6.3754,[46]6.4259,[47]6.5950,[48]6.7029,[49]6.7117,[50]6.7649,[51]6.8072,[52]6.8791,[53]6.9402,[54]6.9868,[55]7.0063,[56]7.0823,[57]7.0897,[58]7.1329,[59]7.1791,[60]7.2232,[61]7.2709,[62]7.3073,[63]7.3673,[64]7.4293,[65]7.4972,[66]7.5685,[67]7.6299,[68]7.6186,[69]7.6415,[70]7.6493,[71]7.6847,[72]7.7552,[73]7.8107,[74]7.8395,[75]7.8152,[76]7.8270,[77]7.8976,[78]7.9346,[79]7.8476,[80]7.8241,[81]7.8209,[82]7.8591,[83]7.8315,[84]7.8261,[85]7.8472,[86]7.9303,[87]7.9743,[88]7.9955,[89]8.0109,[90]8.0011,[91]8.0545,[92]8.0347,[93]8.0797,[94]8.0940,[95]8.0805,[96]8.0713,[97]8.0645,[98]8.1005,[99]8.0796,[100]8.1641,[101]8.2137,[102]8.2089,[103]8.2180,[104]8.2042,[105]8.2038,[106]8.2012,[107]8.2370,[108]8.2704,[109]8.3100,[110]8.3700,[111]8.4787,[112]8.4877,[113]8.4508,[114]8.5037,[115]8.5319,[116]8.4806,[117]8.4850,[118]8.4776,[119]8.4453,[120]8.4664,[121]8.4540,[122]8.4422,[123]8.4035,[124]8.3625,[125]8.3447,[126]8.3292,[127]8.2807,[128]8.2685,[129]8.2353,[130]8.1914,[131]8.1579,[132]8.1311,[133]8.1250,[134]8.1375,[135]8.1307,[136]8.1300,[137]8.1008,[138]8.0728,[139]8.0903,[140]8.0761,[141]8.0732,[142]8.0946,[143]8.1000,[144]8.1349,[145]8.1126,[146]8.0758,[147]8.0396,[148]8.0002,[149]7.9802,[150]7.9379,[151]7.9272,[152]7.9191,[153]7.9154,[154]7.8789,[155]7.8823,[156]7.8432,[157]7.8223,[158]7.7976,[159]7.7800,[160]7.7416,[161]7.7246,[162]7.7171,[163]7.6999,[164]7.7098,[165]7.7018,[166]7.6948,[167]7.6935,[168]7.7147,[169]7.7185,[170]7.7491,[171]7.7541,[172]7.7784,[173]7.8294,[174]7.8438,[175]7.8979,[176]7.9243,[177]7.9767,[178]8.0162,[179]8.0179,[180]7.9923,[181]7.9633,[182]7.9752,[183]7.9436,[184]7.9282,[185]7.9042,[186]7.8789,[187]7.8603,[188]7.8537,[189]7.8686,[190]7.8953,[191]7.9074,[192]7.9170,[193]7.9192,[194]7.9375,[195]7.9540,[196]7.9615,[197]7.9671,[198]7.9489,[199]7.9394,[200]7.9261,[201]7.9266,[202]7.9436,[203]7.9686,[204]7.9891,[205]8.0084,[206]8.0125,[207]8.0387,[208]8.0273,[209]8.0280,[210]8.0243,[211]8.0272,[212]8.0306,[213]8.0277,[214]8.0158,[215]8.0004,[216]7.9936,[217]8.0013,[218]7.9968,[219]7.9762,[220]7.9453,[221]7.9323,[222]7.9186,[223]7.9170,[224]7.9242,[225]7.9042,[226]7.8961,[227]7.8846,[228]7.8597,[229]7.8333,[230]7.8165,[231]7.7974,[232]7.7846,[233]7.7802,[234]7.7780,[235]7.7770,[236]7.7625,[237]7.7533,[238]7.7389,[239]7.7335,[240]7.7430,[241]7.7529,[242]7.7647,[243]7.7618,[244]7.7762,[245]7.7791,[246]7.8023,[247]7.8114,[248]7.8167,[249]7.8264,[250]7.8289,[251]7.8477,[252]7.8657,[253]7.9007,[254]7.9252,[255]7.9294,[256]7.9466,[257]7.9614,[258]7.9483,[259]7.9333,[260]7.9183,[261]7.8962,[262]7.8837,[263]7.8777,[264]7.8748,[265]7.8830,[266]7.8881,[267]7.8878,[268]7.8785,[269]7.8835,[270]7.8793,[271]7.8743,[272]7.8703,[273]7.8680,[274]7.8638,[275]7.8589,[276]7.8453,[277]7.8457,[278]7.8447,[279]7.8363,[280]7.8316,[281]7.8266,[282]7.8243,[283]7.8001,[284]7.7717,[285]7.7814,[286]7.7652,[287]7.7490,[288]7.7467,[289]7.7427,[290]7.7647,[291]7.7694,[292]7.7681,[293]7.7700,[294]7.7871,[295]7.7983,[296]7.8088,[297]7.8317,[298]7.8294,[299]7.8206,[300]7.8214,[301]7.8153,[302]7.8173,[303]7.8125,[304]7.8376,[305]7.8428,[306]7.8415,[307]7.8455,[308]7.8452,[309]7.8442,[310]7.8499,[311]7.8534,[312]7.8437,[313]7.8385,[314]7.8450,[315]7.8327,[316]7.8349,[317]7.8499,[318]7.8570,[319]7.8505,[320]7.8528,[321]7.8423,[322]7.8527,[323]7.8618,[324]7.8682,[325]7.8885,[326]7.8866,[327]7.8754,[328]7.8789,[329]7.8652,[330]7.8568,[331]7.8508,[332]7.8505,[333]7.8524,[334]7.8493,[335]7.8398,[336]7.8421,[337]7.8490,[338]7.8612,[339]7.8580,[340]7.8533,[341]7.8457,[342]7.8455,[343]7.8444,[344]7.8513,[345]7.8598,[346]7.8563,[347]7.8430,[348]7.8450,[349]7.8427,[350]7.8326,[351]7.8323,[352]7.8360,[353]7.8358,[354]7.8259,[355]7.8389,[356]7.8488,[357]7.8537,[358]7.8448,[359]7.8493,[360]7.8489,[361]7.8587,[362]7.8503,[363]7.8448,[364]7.8520,[365]7.8703,[366]7.8962,[367]7.9124,[368]7.9423,[369]7.9580,[370]7.9734,[371]7.9971,[372]8.0169,[373]8.0274,[374]8.0364,[375]8.0557,[376]8.0694,[377]8.0815,[378]8.0946,[379]8.1071,[380]8.1237,[381]8.1408,[382]8.1519,[383]8.1605,[384]8.1724,[385]8.1982,[386]8.2193,[387]8.2191,[388]8.2206,[389]8.2297,[390]8.2537,[391]8.2719,[392]8.2657,[393]8.2648,[394]8.2577,[395]8.2585,[396]8.2668,[397]8.2752,[398]8.2817,[399]8.2895,[400]8.3013,[401]8.3027,[402]8.3024,[403]8.2938,[404]8.2712,[405]8.2583,[406]8.2576,[407]8.2657,[408]8.2750,[409]8.2769,[410]8.2870,[411]8.3046,[412]8.3100,[413]8.3086,[414]8.3064,[415]8.3013,[416]8.2942,[417]8.2989,[418]8.3078,[419]8.3119,[420]8.3128,[421]8.3198,[422]8.3088,[423]8.3079,[424]8.3108,[425]8.3141,[426]8.3153,[427]8.3221,[428]8.3369,[429]8.3446,[430]8.3405,[431]8.3363,[432]8.3409,[433]8.3446,[434]8.3456,[435]8.3543,[436]8.3482,[437]8.3534,[438]8.3553,[439]8.3501,[440]8.3548,[441]8.3546,[442]8.3523,[443]8.3448,[444]8.3471,[445]8.3381,[446]8.3400,[447]8.3342,[448]8.3285,[449]8.3229,[450]8.3289,[451]8.3287,[452]8.3163,[453]8.3074,[454]8.3043,[455]8.3097,[456]8.3081,[457]8.3134,[458]8.3283,[459]8.3248,[460]8.3247,[461]8.3227,[462]8.3213,[463]8.3326,[464]8.3318,[465]8.3329,[466]8.3351,[467]8.3406,[468]8.3455,[469]8.3505,[470]8.3561,[471]8.3456,[472]8.3546,[473]8.3439,[474]8.3426,[475]8.3478,[476]8.3458,[477]8.3364,[478]8.3214,[479]8.3242,[480]8.3318,[481]8.3356,[482]8.3254,[483]8.3338,[484]8.3415,[485]8.3458,[486]8.3452,[487]8.3507,[488]8.3452,[489]8.3343,[490]8.3328,[491]8.3271,[492]8.3271,[493]8.3183,[494]8.3167,[495]8.3110,[496]8.3078,[497]8.3199,[498]8.3263,[499]8.3184,[500]8.3181,[501]8.3182,[502]8.3161,[503]8.3296,[504]8.3328,[505]8.3363,[506]8.3338,[507]8.3307,[508]8.3350,[509]8.3319,[510]8.3307,[511]8.3334,[512]8.3293,[513]8.3317,[514]8.3353,[515]8.3350,[516]8.3378,[517]8.3407,[518]8.3343,[519]8.3345,[520]8.3369,[521]8.3388,[522]8.3295,[523]8.3290,[524]8.3263,[525]8.3301,[526]8.3357,[527]8.3384,[528]8.3378,[529]8.3318,[530]8.3280,[531]8.3313,[532]8.3288,[533]8.3279,[534]8.3279,[535]8.3294,[536]8.3227,[537]8.3285,[538]8.3369,[539]8.3337,[540]8.3459,[541]8.3483,[542]8.3428,[543]8.3452,[544]8.3522,[545]8.3481,[546]8.3405,[547]8.3328,[548]8.3173,[549]8.3178,[550]8.3012,[551]8.2903,[552]8.2809,[553]8.2540,[554]8.2528,[555]8.2562,[556]8.2567,[557]8.2595,[558]8.2589,[559]8.2656,[560]8.2721,[561]8.2812,[562]8.2938,[563]8.3020,[564]8.3000,[565]8.3091,[566]8.3090,[567]8.2960,[568]8.2876,[569]8.2842,[570]8.2838,[571]8.2835,[572]8.2867,[573]8.2874,[574]8.2889,[575]8.2885,[576]8.2944,[577]8.2890,[578]8.2944,[579]8.3000,[580]8.3142,[581]8.3155,[582]8.3276,[583]8.3126,[584]8.3069,
|
| 168 |
+
llama_print_timings: load time = 8105.40 ms
|
| 169 |
+
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 170 |
+
llama_print_timings: prompt eval time = 447048.08 ms / 299008 tokens ( 1.50 ms per token, 668.85 tokens per second)
|
| 171 |
+
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 172 |
+
llama_print_timings: total time = 458500.62 ms / 299009 tokens
|
| 173 |
+
|
| 174 |
+
Final estimate: PPL over 584 chunks for n_ctx=512 = 8.3069 +/- 0.06459
|
logs/perplexity-Qwen3-Coder-Next-Q8_0.log
ADDED
|
@@ -0,0 +1,187 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
|
| 3 |
+
model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf
|
| 4 |
+
#model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-Q8_0.gguf
|
| 5 |
+
#model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf
|
| 6 |
+
#model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf
|
| 7 |
+
|
| 8 |
+
numactl -N "$SOCKET" -m "$SOCKET" \
|
| 9 |
+
./build/bin/llama-perplexity \
|
| 10 |
+
-m "$model" \
|
| 11 |
+
-f wiki.test.raw \
|
| 12 |
+
--seed 1337 \
|
| 13 |
+
--ctx-size 512 \
|
| 14 |
+
-ub 512 -b 2048 \
|
| 15 |
+
--validate-quants \
|
| 16 |
+
--no-mmap \
|
| 17 |
+
--numa numactl \
|
| 18 |
+
--threads 96 \
|
| 19 |
+
--threads-batch 128
|
| 20 |
+
|
| 21 |
+
SOCKET is set to: 1
|
| 22 |
+
main: build = 4211 (b2cb4512)
|
| 23 |
+
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
| 24 |
+
main: seed = 1337
|
| 25 |
+
CPU: using device CPU - 0 MiB free
|
| 26 |
+
llama_model_loader: loaded meta data with 43 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3.5-Coder-Next-Q8_0.gguf (version GGUF V3 (latest))
|
| 27 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
| 28 |
+
llama_model_loader: - kv 0: general.architecture str = qwen3next
|
| 29 |
+
llama_model_loader: - kv 1: general.type str = model
|
| 30 |
+
llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
|
| 31 |
+
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
|
| 32 |
+
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
|
| 33 |
+
llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
|
| 34 |
+
llama_model_loader: - kv 6: general.size_label str = 512x2.5B
|
| 35 |
+
llama_model_loader: - kv 7: general.license str = apache-2.0
|
| 36 |
+
llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
|
| 37 |
+
llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
|
| 38 |
+
llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
|
| 39 |
+
llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
|
| 40 |
+
llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
|
| 41 |
+
llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
|
| 42 |
+
llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
|
| 43 |
+
llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
|
| 44 |
+
llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
|
| 45 |
+
llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
|
| 46 |
+
llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
|
| 47 |
+
llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
|
| 48 |
+
llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
|
| 49 |
+
llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
|
| 50 |
+
llama_model_loader: - kv 22: general.file_type u32 = 7
|
| 51 |
+
llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
|
| 52 |
+
llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
|
| 53 |
+
llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
|
| 54 |
+
llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
|
| 55 |
+
llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
|
| 56 |
+
llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
|
| 57 |
+
llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
|
| 58 |
+
llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
|
| 59 |
+
llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
|
| 60 |
+
llama_model_loader: - kv 32: general.quantization_version u32 = 2
|
| 61 |
+
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
|
| 62 |
+
llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
|
| 63 |
+
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
| 64 |
+
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
| 65 |
+
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
|
| 66 |
+
llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
|
| 67 |
+
llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
|
| 68 |
+
llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
|
| 69 |
+
llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
|
| 70 |
+
llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
|
| 71 |
+
llama_model_loader: - type f32: 361 tensors
|
| 72 |
+
llama_model_loader: - type q8_0: 482 tensors
|
| 73 |
+
load: printing all EOG tokens:
|
| 74 |
+
load: - 151643 ('<|endoftext|>')
|
| 75 |
+
load: - 151645 ('<|im_end|>')
|
| 76 |
+
load: - 151662 ('<|fim_pad|>')
|
| 77 |
+
load: - 151663 ('<|repo_name|>')
|
| 78 |
+
load: - 151664 ('<|file_sep|>')
|
| 79 |
+
load: special tokens cache size = 26
|
| 80 |
+
load: token to piece cache size = 0.9311 MB
|
| 81 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
| 82 |
+
llm_load_print_meta: arch = qwen3next
|
| 83 |
+
llm_load_print_meta: n_ctx_train = 262144
|
| 84 |
+
llm_load_print_meta: n_embd = 2048
|
| 85 |
+
llm_load_print_meta: n_layer = 48
|
| 86 |
+
llm_load_print_meta: n_head = 16
|
| 87 |
+
llm_load_print_meta: n_head_kv = 2
|
| 88 |
+
llm_load_print_meta: n_rot = 64
|
| 89 |
+
llm_load_print_meta: n_swa = 0
|
| 90 |
+
llm_load_print_meta: n_swa_pattern = 1
|
| 91 |
+
llm_load_print_meta: n_embd_head_k = 256
|
| 92 |
+
llm_load_print_meta: n_embd_head_v = 256
|
| 93 |
+
llm_load_print_meta: n_gqa = 8
|
| 94 |
+
llm_load_print_meta: n_embd_k_gqa = 512
|
| 95 |
+
llm_load_print_meta: n_embd_v_gqa = 512
|
| 96 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
| 97 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
| 98 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
| 99 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
| 100 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
| 101 |
+
llm_load_print_meta: n_ff = 5120
|
| 102 |
+
llm_load_print_meta: n_expert = 512
|
| 103 |
+
llm_load_print_meta: n_expert_used = 10
|
| 104 |
+
llm_load_print_meta: causal attn = 1
|
| 105 |
+
llm_load_print_meta: pooling type = 0
|
| 106 |
+
llm_load_print_meta: rope type = 2
|
| 107 |
+
llm_load_print_meta: rope scaling = linear
|
| 108 |
+
llm_load_print_meta: freq_base_train = 5000000.0
|
| 109 |
+
llm_load_print_meta: freq_scale_train = 1
|
| 110 |
+
llm_load_print_meta: n_ctx_orig_yarn = 262144
|
| 111 |
+
llm_load_print_meta: rope_finetuned = unknown
|
| 112 |
+
llm_load_print_meta: ssm_d_conv = 4
|
| 113 |
+
llm_load_print_meta: ssm_d_inner = 4096
|
| 114 |
+
llm_load_print_meta: ssm_d_state = 128
|
| 115 |
+
llm_load_print_meta: ssm_dt_rank = 32
|
| 116 |
+
llm_load_print_meta: model type = 80B.A3B
|
| 117 |
+
llm_load_print_meta: model ftype = Q8_0
|
| 118 |
+
llm_load_print_meta: model params = 79.674 B
|
| 119 |
+
llm_load_print_meta: model size = 78.982 GiB (8.515 BPW)
|
| 120 |
+
llm_load_print_meta: repeating layers = 78.366 GiB (8.515 BPW, 79.052 B parameters)
|
| 121 |
+
llm_load_print_meta: general.name = Qwen3 Coder Next
|
| 122 |
+
print_info: vocab type = BPE
|
| 123 |
+
print_info: n_vocab = 151936
|
| 124 |
+
print_info: n_merges = 151387
|
| 125 |
+
print_info: BOS token = 151643 '<|endoftext|>'
|
| 126 |
+
print_info: EOS token = 151645 '<|im_end|>'
|
| 127 |
+
print_info: EOT token = 151645 '<|im_end|>'
|
| 128 |
+
print_info: PAD token = 151643 '<|endoftext|>'
|
| 129 |
+
print_info: LF token = 198 'Ċ'
|
| 130 |
+
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
|
| 131 |
+
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
|
| 132 |
+
print_info: FIM MID token = 151660 '<|fim_middle|>'
|
| 133 |
+
print_info: FIM PAD token = 151662 '<|fim_pad|>'
|
| 134 |
+
print_info: FIM REP token = 151663 '<|repo_name|>'
|
| 135 |
+
print_info: FIM SEP token = 151664 '<|file_sep|>'
|
| 136 |
+
print_info: EOG token = 151643 '<|endoftext|>'
|
| 137 |
+
print_info: EOG token = 151645 '<|im_end|>'
|
| 138 |
+
print_info: EOG token = 151662 '<|fim_pad|>'
|
| 139 |
+
print_info: EOG token = 151663 '<|repo_name|>'
|
| 140 |
+
print_info: EOG token = 151664 '<|file_sep|>'
|
| 141 |
+
print_info: max token length = 256
|
| 142 |
+
llm_load_tensors: ggml ctx size = 0.35 MiB
|
| 143 |
+
llm_load_tensors: offloading 0 repeating layers to GPU
|
| 144 |
+
llm_load_tensors: offloaded 0/49 layers to GPU
|
| 145 |
+
llm_load_tensors: CPU buffer size = 80877.56 MiB
|
| 146 |
+
....................................................................................................
|
| 147 |
+
llama_init_from_model: n_ctx = 2048
|
| 148 |
+
llama_init_from_model: n_batch = 2048
|
| 149 |
+
llama_init_from_model: n_ubatch = 512
|
| 150 |
+
llama_init_from_model: flash_attn = 1
|
| 151 |
+
llama_init_from_model: attn_max_b = 0
|
| 152 |
+
llama_init_from_model: fused_moe = 1
|
| 153 |
+
llama_init_from_model: grouped er = 0
|
| 154 |
+
llama_init_from_model: fused_up_gate = 1
|
| 155 |
+
llama_init_from_model: fused_mmad = 1
|
| 156 |
+
llama_init_from_model: rope_cache = 0
|
| 157 |
+
llama_init_from_model: graph_reuse = 1
|
| 158 |
+
llama_init_from_model: k_cache_hadam = 0
|
| 159 |
+
llama_init_from_model: split_mode_graph_scheduling = 0
|
| 160 |
+
llama_init_from_model: reduce_type = f16
|
| 161 |
+
llama_init_from_model: sched_async = 0
|
| 162 |
+
llama_init_from_model: ser = -1, 0
|
| 163 |
+
llama_init_from_model: freq_base = 5000000.0
|
| 164 |
+
llama_init_from_model: freq_scale = 1
|
| 165 |
+
llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
|
| 166 |
+
llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
|
| 167 |
+
llama_init_from_model: CPU output buffer size = 2.32 MiB
|
| 168 |
+
llama_init_from_model: CPU compute buffer size = 300.75 MiB
|
| 169 |
+
llama_init_from_model: graph nodes = 12382
|
| 170 |
+
llama_init_from_model: graph splits = 1
|
| 171 |
+
llama_init_from_model: enabling only_active_experts scheduling
|
| 172 |
+
|
| 173 |
+
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
|
| 174 |
+
perplexity: tokenizing the input ..
|
| 175 |
+
perplexity: tokenization took 393.016 ms
|
| 176 |
+
perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
| 177 |
+
perplexity: 3.89 seconds per pass - ETA 9.45 minutes
|
| 178 |
+
===================================== llama_init_from_model: f16
|
| 179 |
+
======================================= HAVE_FANCY_SIMD is defined
|
| 180 |
+
[1]4.5875,[2]6.5391,[3]5.5418,[4]4.7813,[5]4.7239,[6]4.8641,[7]4.9464,[8]4.9799,[9]4.8794,[10]4.9186,[11]4.8311,[12]5.0638,[13]5.4758,[14]5.4421,[15]5.5581,[16]5.9216,[17]5.7716,[18]5.9584,[19]6.0471,[20]6.0826,[21]6.0039,[22]6.1138,[23]5.9257,[24]5.6949,[25]5.6193,[26]5.5165,[27]5.4492,[28]5.4022,[29]5.4725,[30]5.4682,[31]5.4576,[32]5.5297,[33]5.4780,[34]5.5340,[35]5.6289,[36]5.7241,[37]5.8537,[38]5.9517,[39]5.9865,[40]6.0831,[41]6.1320,[42]6.1626,[43]6.2308,[44]6.2354,[45]6.2713,[46]6.3169,[47]6.4885,[48]6.6022,[49]6.6025,[50]6.6506,[51]6.6955,[52]6.7621,[53]6.8298,[54]6.8792,[55]6.8985,[56]6.9732,[57]6.9892,[58]7.0319,[59]7.0810,[60]7.1243,[61]7.1677,[62]7.2040,[63]7.2667,[64]7.3282,[65]7.3910,[66]7.4599,[67]7.5227,[68]7.5168,[69]7.5369,[70]7.5450,[71]7.5845,[72]7.6558,[73]7.7078,[74]7.7400,[75]7.7186,[76]7.7322,[77]7.7932,[78]7.8266,[79]7.7395,[80]7.7131,[81]7.7061,[82]7.7441,[83]7.7189,[84]7.7117,[85]7.7374,[86]7.8199,[87]7.8592,[88]7.8776,[89]7.8941,[90]7.8811,[91]7.9399,[92]7.9209,[93]7.9644,[94]7.9827,[95]7.9716,[96]7.9631,[97]7.9575,[98]7.9951,[99]7.9748,[100]8.0542,[101]8.1007,[102]8.0984,[103]8.1046,[104]8.0933,[105]8.0955,[106]8.0986,[107]8.1340,[108]8.1685,[109]8.2070,[110]8.2639,[111]8.3726,[112]8.3847,[113]8.3486,[114]8.4009,[115]8.4281,[116]8.3756,[117]8.3775,[118]8.3687,[119]8.3335,[120]8.3531,[121]8.3422,[122]8.3302,[123]8.2914,[124]8.2508,[125]8.2295,[126]8.2170,[127]8.1696,[128]8.1543,[129]8.1240,[130]8.0819,[131]8.0492,[132]8.0243,[133]8.0171,[134]8.0301,[135]8.0236,[136]8.0272,[137]7.9992,[138]7.9701,[139]7.9854,[140]7.9697,[141]7.9669,[142]7.9874,[143]7.9921,[144]8.0284,[145]8.0062,[146]7.9704,[147]7.9344,[148]7.8958,[149]7.8752,[150]7.8334,[151]7.8219,[152]7.8126,[153]7.8092,[154]7.7721,[155]7.7746,[156]7.7377,[157]7.7190,[158]7.6944,[159]7.6786,[160]7.6406,[161]7.6243,[162]7.6177,[163]7.5988,[164]7.6113,[165]7.6010,[166]7.5931,[167]7.5936,[168]7.6145,[169]7.6185,[170]7.6464,[171]7.6500,[172]7.6718,[173]7.7234,[174]7.7359,[175]7.7903,[176]7.8151,[177]7.8677,[178]7.9072,[179]7.9127,[180]7.8864,[181]7.8569,[182]7.8669,[183]7.8351,[184]7.8213,[185]7.7934,[186]7.7633,[187]7.7423,[188]7.7376,[189]7.7526,[190]7.7795,[191]7.7907,[192]7.8004,[193]7.8020,[194]7.8192,[195]7.8349,[196]7.8423,[197]7.8494,[198]7.8328,[199]7.8242,[200]7.8113,[201]7.8119,[202]7.8288,[203]7.8531,[204]7.8743,[205]7.8922,[206]7.8944,[207]7.9204,[208]7.9081,[209]7.9079,[210]7.9063,[211]7.9100,[212]7.9139,[213]7.9127,[214]7.9003,[215]7.8869,[216]7.8808,[217]7.8881,[218]7.8830,[219]7.8656,[220]7.8363,[221]7.8221,[222]7.8089,[223]7.8070,[224]7.8140,[225]7.7961,[226]7.7891,[227]7.7782,[228]7.7538,[229]7.7259,[230]7.7102,[231]7.6916,[232]7.6795,[233]7.6742,[234]7.6720,[235]7.6704,[236]7.6567,[237]7.6480,[238]7.6346,[239]7.6289,[240]7.6378,[241]7.6485,[242]7.6605,[243]7.6586,[244]7.6725,[245]7.6758,[246]7.6980,[247]7.7085,[248]7.7137,[249]7.7237,[250]7.7270,[251]7.7456,[252]7.7635,[253]7.7988,[254]7.8217,[255]7.8256,[256]7.8426,[257]7.8581,[258]7.8453,[259]7.8316,[260]7.8173,[261]7.7956,[262]7.7840,[263]7.7788,[264]7.7764,[265]7.7850,[266]7.7912,[267]7.7904,[268]7.7816,[269]7.7872,[270]7.7839,[271]7.7786,[272]7.7765,[273]7.7742,[274]7.7705,[275]7.7655,[276]7.7512,[277]7.7516,[278]7.7500,[279]7.7421,[280]7.7375,[281]7.7329,[282]7.7307,[283]7.7072,[284]7.6786,[285]7.6887,[286]7.6725,[287]7.6578,[288]7.6568,[289]7.6535,[290]7.6756,[291]7.6809,[292]7.6808,[293]7.6829,[294]7.6995,[295]7.7105,[296]7.7202,[297]7.7424,[298]7.7402,[299]7.7307,[300]7.7315,[301]7.7251,[302]7.7273,[303]7.7223,[304]7.7467,[305]7.7518,[306]7.7497,[307]7.7536,[308]7.7547,[309]7.7544,[310]7.7602,[311]7.7630,[312]7.7532,[313]7.7490,[314]7.7557,[315]7.7434,[316]7.7452,[317]7.7606,[318]7.7680,[319]7.7611,[320]7.7641,[321]7.7534,[322]7.7638,[323]7.7729,[324]7.7806,[325]7.8009,[326]7.7988,[327]7.7870,[328]7.7917,[329]7.7791,[330]7.7707,[331]7.7643,[332]7.7649,[333]7.7685,[334]7.7650,[335]7.7566,[336]7.7590,[337]7.7650,[338]7.7781,[339]7.7748,[340]7.7709,[341]7.7633,[342]7.7633,[343]7.7624,[344]7.7673,[345]7.7760,[346]7.7717,[347]7.7597,[348]7.7629,[349]7.7586,[350]7.7497,[351]7.7482,[352]7.7532,[353]7.7523,[354]7.7426,[355]7.7545,[356]7.7631,[357]7.7686,[358]7.7614,[359]7.7656,[360]7.7660,[361]7.7759,[362]7.7672,[363]7.7609,[364]7.7680,[365]7.7866,[366]7.8134,[367]7.8287,[368]7.8593,[369]7.8742,[370]7.8896,[371]7.9130,[372]7.9338,[373]7.9449,[374]7.9543,[375]7.9728,[376]7.9863,[377]7.9976,[378]8.0101,[379]8.0213,[380]8.0378,[381]8.0547,[382]8.0647,[383]8.0727,[384]8.0838,[385]8.1097,[386]8.1306,[387]8.1295,[388]8.1297,[389]8.1393,[390]8.1631,[391]8.1808,[392]8.1749,[393]8.1734,[394]8.1663,[395]8.1673,[396]8.1756,[397]8.1846,[398]8.1902,[399]8.1980,[400]8.2104,[401]8.2112,[402]8.2106,[403]8.2021,[404]8.1795,[405]8.1670,[406]8.1673,[407]8.1752,[408]8.1851,[409]8.1873,[410]8.1961,[411]8.2133,[412]8.2197,[413]8.2181,[414]8.2159,[415]8.2107,[416]8.2037,[417]8.2099,[418]8.2194,[419]8.2237,[420]8.2255,[421]8.2326,[422]8.2217,[423]8.2213,[424]8.2240,[425]8.2268,[426]8.2283,[427]8.2356,[428]8.2494,[429]8.2573,[430]8.2530,[431]8.2494,[432]8.2536,[433]8.2568,[434]8.2587,[435]8.2682,[436]8.2620,[437]8.2673,[438]8.2690,[439]8.2641,[440]8.2684,[441]8.2672,[442]8.2645,[443]8.2568,[444]8.2591,[445]8.2505,[446]8.2530,[447]8.2469,[448]8.2416,[449]8.2355,[450]8.2415,[451]8.2405,[452]8.2285,[453]8.2199,[454]8.2177,[455]8.2240,[456]8.2229,[457]8.2282,[458]8.2418,[459]8.2390,[460]8.2393,[461]8.2374,[462]8.2352,[463]8.2458,[464]8.2449,[465]8.2458,[466]8.2483,[467]8.2540,[468]8.2595,[469]8.2643,[470]8.2695,[471]8.2593,[472]8.2674,[473]8.2575,[474]8.2567,[475]8.2620,[476]8.2605,[477]8.2506,[478]8.2361,[479]8.2396,[480]8.2475,[481]8.2514,[482]8.2405,[483]8.2480,[484]8.2555,[485]8.2596,[486]8.2591,[487]8.2645,[488]8.2599,[489]8.2489,[490]8.2482,[491]8.2419,[492]8.2427,[493]8.2344,[494]8.2327,[495]8.2281,[496]8.2245,[497]8.2362,[498]8.2428,[499]8.2357,[500]8.2358,[501]8.2366,[502]8.2351,[503]8.2484,[504]8.2517,[505]8.2556,[506]8.2535,[507]8.2504,[508]8.2543,[509]8.2519,[510]8.2510,[511]8.2543,[512]8.2496,[513]8.2516,[514]8.2549,[515]8.2549,[516]8.2573,[517]8.2604,[518]8.2537,[519]8.2540,[520]8.2569,[521]8.2595,[522]8.2499,[523]8.2494,[524]8.2466,[525]8.2499,[526]8.2552,[527]8.2570,[528]8.2560,[529]8.2508,[530]8.2471,[531]8.2510,[532]8.2483,[533]8.2472,[534]8.2475,[535]8.2486,[536]8.2413,[537]8.2474,[538]8.2560,[539]8.2523,[540]8.2648,[541]8.2669,[542]8.2615,[543]8.2644,[544]8.2708,[545]8.2668,[546]8.2583,[547]8.2508,[548]8.2356,[549]8.2363,[550]8.2200,[551]8.2092,[552]8.1997,[553]8.1730,[554]8.1722,[555]8.1753,[556]8.1762,[557]8.1790,[558]8.1785,[559]8.1850,[560]8.1915,[561]8.2008,[562]8.2139,[563]8.2215,[564]8.2194,[565]8.2281,[566]8.2281,[567]8.2141,[568]8.2063,[569]8.2034,[570]8.2029,[571]8.2027,[572]8.2051,[573]8.2058,[574]8.2074,[575]8.2069,[576]8.2131,[577]8.2080,[578]8.2128,[579]8.2179,[580]8.2319,[581]8.2334,[582]8.2452,[583]8.2296,[584]8.2239,
|
| 181 |
+
llama_print_timings: load time = 17688.23 ms
|
| 182 |
+
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 183 |
+
llama_print_timings: prompt eval time = 476918.01 ms / 299008 tokens ( 1.60 ms per token, 626.96 tokens per second)
|
| 184 |
+
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 185 |
+
llama_print_timings: total time = 488325.61 ms / 299009 tokens
|
| 186 |
+
|
| 187 |
+
Final estimate: PPL over 584 chunks for n_ctx=512 = 8.2239 +/- 0.06389
|
logs/perplexity-Qwen3-Coder-Next-smol-IQ2_KS.log
ADDED
|
@@ -0,0 +1,174 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
SOCKET is set to: 0
|
| 2 |
+
main: build = 4211 (b2cb4512)
|
| 3 |
+
main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
|
| 4 |
+
main: seed = 1337
|
| 5 |
+
CPU: using device CPU - 0 MiB free
|
| 6 |
+
llama_model_loader: loaded meta data with 47 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf (version GGUF V3 (latest))
|
| 7 |
+
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
|
| 8 |
+
llama_model_loader: - kv 0: general.architecture str = qwen3next
|
| 9 |
+
llama_model_loader: - kv 1: general.type str = model
|
| 10 |
+
llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
|
| 11 |
+
llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
|
| 12 |
+
llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
|
| 13 |
+
llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
|
| 14 |
+
llama_model_loader: - kv 6: general.size_label str = 512x2.5B
|
| 15 |
+
llama_model_loader: - kv 7: general.license str = apache-2.0
|
| 16 |
+
llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
|
| 17 |
+
llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
|
| 18 |
+
llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
|
| 19 |
+
llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
|
| 20 |
+
llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
|
| 21 |
+
llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
|
| 22 |
+
llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
|
| 23 |
+
llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
|
| 24 |
+
llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
|
| 25 |
+
llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
|
| 26 |
+
llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
|
| 27 |
+
llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
|
| 28 |
+
llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
|
| 29 |
+
llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
|
| 30 |
+
llama_model_loader: - kv 22: general.file_type u32 = 147
|
| 31 |
+
llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
|
| 32 |
+
llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
|
| 33 |
+
llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
|
| 34 |
+
llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
|
| 35 |
+
llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
|
| 36 |
+
llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
|
| 37 |
+
llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
|
| 38 |
+
llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
|
| 39 |
+
llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
|
| 40 |
+
llama_model_loader: - kv 32: general.quantization_version u32 = 2
|
| 41 |
+
llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
|
| 42 |
+
llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
|
| 43 |
+
llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
|
| 44 |
+
llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
|
| 45 |
+
llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
|
| 46 |
+
llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
|
| 47 |
+
llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
|
| 48 |
+
llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
|
| 49 |
+
llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
|
| 50 |
+
llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
|
| 51 |
+
llama_model_loader: - kv 43: quantize.imatrix.file str = /mnt/data/models/ubergarm/Qwen3-Coder...
|
| 52 |
+
llama_model_loader: - kv 44: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
|
| 53 |
+
llama_model_loader: - kv 45: quantize.imatrix.entries_count i32 = 577
|
| 54 |
+
llama_model_loader: - kv 46: quantize.imatrix.chunks_count i32 = 840
|
| 55 |
+
llama_model_loader: - type f32: 361 tensors
|
| 56 |
+
llama_model_loader: - type q8_0: 336 tensors
|
| 57 |
+
llama_model_loader: - type iq4_k: 1 tensors
|
| 58 |
+
llama_model_loader: - type iq6_k: 1 tensors
|
| 59 |
+
llama_model_loader: - type iq2_ks: 144 tensors
|
| 60 |
+
load: printing all EOG tokens:
|
| 61 |
+
load: - 151643 ('<|endoftext|>')
|
| 62 |
+
load: - 151645 ('<|im_end|>')
|
| 63 |
+
load: - 151662 ('<|fim_pad|>')
|
| 64 |
+
load: - 151663 ('<|repo_name|>')
|
| 65 |
+
load: - 151664 ('<|file_sep|>')
|
| 66 |
+
load: special tokens cache size = 26
|
| 67 |
+
load: token to piece cache size = 0.9311 MB
|
| 68 |
+
llm_load_print_meta: format = GGUF V3 (latest)
|
| 69 |
+
llm_load_print_meta: arch = qwen3next
|
| 70 |
+
llm_load_print_meta: n_ctx_train = 262144
|
| 71 |
+
llm_load_print_meta: n_embd = 2048
|
| 72 |
+
llm_load_print_meta: n_layer = 48
|
| 73 |
+
llm_load_print_meta: n_head = 16
|
| 74 |
+
llm_load_print_meta: n_head_kv = 2
|
| 75 |
+
llm_load_print_meta: n_rot = 64
|
| 76 |
+
llm_load_print_meta: n_swa = 0
|
| 77 |
+
llm_load_print_meta: n_swa_pattern = 1
|
| 78 |
+
llm_load_print_meta: n_embd_head_k = 256
|
| 79 |
+
llm_load_print_meta: n_embd_head_v = 256
|
| 80 |
+
llm_load_print_meta: n_gqa = 8
|
| 81 |
+
llm_load_print_meta: n_embd_k_gqa = 512
|
| 82 |
+
llm_load_print_meta: n_embd_v_gqa = 512
|
| 83 |
+
llm_load_print_meta: f_norm_eps = 0.0e+00
|
| 84 |
+
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
|
| 85 |
+
llm_load_print_meta: f_clamp_kqv = 0.0e+00
|
| 86 |
+
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
|
| 87 |
+
llm_load_print_meta: f_logit_scale = 0.0e+00
|
| 88 |
+
llm_load_print_meta: n_ff = 5120
|
| 89 |
+
llm_load_print_meta: n_expert = 512
|
| 90 |
+
llm_load_print_meta: n_expert_used = 10
|
| 91 |
+
llm_load_print_meta: causal attn = 1
|
| 92 |
+
llm_load_print_meta: pooling type = 0
|
| 93 |
+
llm_load_print_meta: rope type = 2
|
| 94 |
+
llm_load_print_meta: rope scaling = linear
|
| 95 |
+
llm_load_print_meta: freq_base_train = 5000000.0
|
| 96 |
+
llm_load_print_meta: freq_scale_train = 1
|
| 97 |
+
llm_load_print_meta: n_ctx_orig_yarn = 262144
|
| 98 |
+
llm_load_print_meta: rope_finetuned = unknown
|
| 99 |
+
llm_load_print_meta: ssm_d_conv = 4
|
| 100 |
+
llm_load_print_meta: ssm_d_inner = 4096
|
| 101 |
+
llm_load_print_meta: ssm_d_state = 128
|
| 102 |
+
llm_load_print_meta: ssm_dt_rank = 32
|
| 103 |
+
llm_load_print_meta: model type = 80B.A3B
|
| 104 |
+
llm_load_print_meta: model ftype = IQ2_KS - 2.1875 bpw
|
| 105 |
+
llm_load_print_meta: model params = 79.674 B
|
| 106 |
+
llm_load_print_meta: model size = 22.097 GiB (2.382 BPW)
|
| 107 |
+
llm_load_print_meta: repeating layers = 21.694 GiB (2.357 BPW, 79.052 B parameters)
|
| 108 |
+
llm_load_print_meta: general.name = Qwen3 Coder Next
|
| 109 |
+
print_info: vocab type = BPE
|
| 110 |
+
print_info: n_vocab = 151936
|
| 111 |
+
print_info: n_merges = 151387
|
| 112 |
+
print_info: BOS token = 151643 '<|endoftext|>'
|
| 113 |
+
print_info: EOS token = 151645 '<|im_end|>'
|
| 114 |
+
print_info: EOT token = 151645 '<|im_end|>'
|
| 115 |
+
print_info: PAD token = 151643 '<|endoftext|>'
|
| 116 |
+
print_info: LF token = 198 'Ċ'
|
| 117 |
+
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
|
| 118 |
+
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
|
| 119 |
+
print_info: FIM MID token = 151660 '<|fim_middle|>'
|
| 120 |
+
print_info: FIM PAD token = 151662 '<|fim_pad|>'
|
| 121 |
+
print_info: FIM REP token = 151663 '<|repo_name|>'
|
| 122 |
+
print_info: FIM SEP token = 151664 '<|file_sep|>'
|
| 123 |
+
print_info: EOG token = 151643 '<|endoftext|>'
|
| 124 |
+
print_info: EOG token = 151645 '<|im_end|>'
|
| 125 |
+
print_info: EOG token = 151662 '<|fim_pad|>'
|
| 126 |
+
print_info: EOG token = 151663 '<|repo_name|>'
|
| 127 |
+
print_info: EOG token = 151664 '<|file_sep|>'
|
| 128 |
+
print_info: max token length = 256
|
| 129 |
+
llm_load_tensors: ggml ctx size = 0.35 MiB
|
| 130 |
+
llm_load_tensors: offloading 0 repeating layers to GPU
|
| 131 |
+
llm_load_tensors: offloaded 0/49 layers to GPU
|
| 132 |
+
llm_load_tensors: CPU buffer size = 22627.63 MiB
|
| 133 |
+
....................................................................................................
|
| 134 |
+
llama_init_from_model: n_ctx = 2048
|
| 135 |
+
llama_init_from_model: n_batch = 2048
|
| 136 |
+
llama_init_from_model: n_ubatch = 512
|
| 137 |
+
llama_init_from_model: flash_attn = 1
|
| 138 |
+
llama_init_from_model: attn_max_b = 0
|
| 139 |
+
llama_init_from_model: fused_moe = 1
|
| 140 |
+
llama_init_from_model: grouped er = 0
|
| 141 |
+
llama_init_from_model: fused_up_gate = 1
|
| 142 |
+
llama_init_from_model: fused_mmad = 1
|
| 143 |
+
llama_init_from_model: rope_cache = 0
|
| 144 |
+
llama_init_from_model: graph_reuse = 1
|
| 145 |
+
llama_init_from_model: k_cache_hadam = 0
|
| 146 |
+
llama_init_from_model: split_mode_graph_scheduling = 0
|
| 147 |
+
llama_init_from_model: reduce_type = f16
|
| 148 |
+
llama_init_from_model: sched_async = 0
|
| 149 |
+
llama_init_from_model: ser = -1, 0
|
| 150 |
+
llama_init_from_model: freq_base = 5000000.0
|
| 151 |
+
llama_init_from_model: freq_scale = 1
|
| 152 |
+
llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
|
| 153 |
+
llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
|
| 154 |
+
llama_init_from_model: CPU output buffer size = 2.32 MiB
|
| 155 |
+
llama_init_from_model: CPU compute buffer size = 300.75 MiB
|
| 156 |
+
llama_init_from_model: graph nodes = 12382
|
| 157 |
+
llama_init_from_model: graph splits = 1
|
| 158 |
+
llama_init_from_model: enabling only_active_experts scheduling
|
| 159 |
+
|
| 160 |
+
system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
|
| 161 |
+
perplexity: tokenizing the input ..
|
| 162 |
+
perplexity: tokenization took 392.737 ms
|
| 163 |
+
perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
|
| 164 |
+
perplexity: 3.33 seconds per pass - ETA 8.10 minutes
|
| 165 |
+
===================================== llama_init_from_model: f16
|
| 166 |
+
======================================= HAVE_FANCY_SIMD is defined
|
| 167 |
+
[1]4.8429,[2]7.8294,[3]7.0716,[4]6.5755,[5]6.9072,[6]7.1962,[7]7.3575,[8]7.5293,[9]7.6946,[10]8.0277,[11]8.0696,[12]8.2023,[13]8.7628,[14]8.6121,[15]8.5901,[16]9.0088,[17]8.5033,[18]8.6134,[19]8.6028,[20]8.6108,[21]8.4277,[22]8.4716,[23]8.1122,[24]7.7254,[25]7.5349,[26]7.3212,[27]7.1690,[28]7.0490,[29]7.1071,[30]7.0900,[31]7.0594,[32]7.0950,[33]7.0187,[34]7.0633,[35]7.1666,[36]7.2491,[37]7.3830,[38]7.4542,[39]7.4793,[40]7.5868,[41]7.6107,[42]7.6280,[43]7.6983,[44]7.7012,[45]7.7412,[46]7.7686,[47]7.9614,[48]8.0843,[49]8.0835,[50]8.1432,[51]8.1917,[52]8.2669,[53]8.3461,[54]8.4150,[55]8.4272,[56]8.5091,[57]8.5269,[58]8.5722,[59]8.6444,[60]8.6816,[61]8.7244,[62]8.7569,[63]8.8195,[64]8.8811,[65]8.9548,[66]9.0362,[67]9.1083,[68]9.0784,[69]9.1017,[70]9.1076,[71]9.1492,[72]9.2382,[73]9.2889,[74]9.3185,[75]9.2809,[76]9.2815,[77]9.3489,[78]9.3843,[79]9.3121,[80]9.2703,[81]9.2557,[82]9.3199,[83]9.2880,[84]9.2721,[85]9.3036,[86]9.3928,[87]9.4341,[88]9.4424,[89]9.4425,[90]9.4179,[91]9.4699,[92]9.4452,[93]9.4936,[94]9.5031,[95]9.4816,[96]9.4613,[97]9.4439,[98]9.4771,[99]9.4538,[100]9.5435,[101]9.5925,[102]9.5856,[103]9.5986,[104]9.5757,[105]9.5770,[106]9.5734,[107]9.6082,[108]9.6454,[109]9.6948,[110]9.7585,[111]9.8749,[112]9.8825,[113]9.8339,[114]9.8917,[115]9.9160,[116]9.8810,[117]9.8785,[118]9.8490,[119]9.8067,[120]9.8207,[121]9.8138,[122]9.8066,[123]9.7576,[124]9.7008,[125]9.6762,[126]9.6569,[127]9.6139,[128]9.5997,[129]9.5630,[130]9.5106,[131]9.4652,[132]9.4329,[133]9.4258,[134]9.4413,[135]9.4357,[136]9.4305,[137]9.3887,[138]9.3495,[139]9.3600,[140]9.3350,[141]9.3276,[142]9.3555,[143]9.3724,[144]9.4073,[145]9.3838,[146]9.3417,[147]9.2916,[148]9.2446,[149]9.2204,[150]9.1729,[151]9.1506,[152]9.1439,[153]9.1381,[154]9.0907,[155]9.1003,[156]9.0551,[157]9.0207,[158]8.9844,[159]8.9584,[160]8.9144,[161]8.8936,[162]8.8854,[163]8.8668,[164]8.8804,[165]8.8623,[166]8.8591,[167]8.8520,[168]8.8742,[169]8.8855,[170]8.9246,[171]8.9493,[172]8.9828,[173]9.0260,[174]9.0397,[175]9.0997,[176]9.1308,[177]9.1837,[178]9.2297,[179]9.2357,[180]9.2267,[181]9.2222,[182]9.2436,[183]9.2162,[184]9.2135,[185]9.2016,[186]9.1843,[187]9.1681,[188]9.1640,[189]9.1750,[190]9.2058,[191]9.2156,[192]9.2265,[193]9.2226,[194]9.2407,[195]9.2596,[196]9.2704,[197]9.2788,[198]9.2535,[199]9.2347,[200]9.2160,[201]9.2232,[202]9.2413,[203]9.2683,[204]9.2869,[205]9.3039,[206]9.3015,[207]9.3260,[208]9.3149,[209]9.3181,[210]9.3171,[211]9.3191,[212]9.3221,[213]9.3198,[214]9.3041,[215]9.2824,[216]9.2732,[217]9.2754,[218]9.2671,[219]9.2412,[220]9.2017,[221]9.1841,[222]9.1637,[223]9.1600,[224]9.1689,[225]9.1414,[226]9.1301,[227]9.1159,[228]9.0854,[229]9.0513,[230]9.0313,[231]9.0167,[232]9.0023,[233]9.0003,[234]8.9999,[235]8.9939,[236]8.9729,[237]8.9578,[238]8.9379,[239]8.9358,[240]8.9414,[241]8.9438,[242]8.9530,[243]8.9562,[244]8.9719,[245]8.9784,[246]9.0025,[247]9.0127,[248]9.0172,[249]9.0202,[250]9.0239,[251]9.0454,[252]9.0630,[253]9.0954,[254]9.1210,[255]9.1267,[256]9.1444,[257]9.1605,[258]9.1417,[259]9.1280,[260]9.1092,[261]9.0840,[262]9.0690,[263]9.0631,[264]9.0623,[265]9.0710,[266]9.0798,[267]9.0735,[268]9.0635,[269]9.0666,[270]9.0629,[271]9.0544,[272]9.0509,[273]9.0485,[274]9.0404,[275]9.0373,[276]9.0187,[277]9.0170,[278]9.0184,[279]9.0115,[280]9.0061,[281]8.9988,[282]8.9964,[283]8.9631,[284]8.9301,[285]8.9365,[286]8.9206,[287]8.8995,[288]8.8962,[289]8.8897,[290]8.9140,[291]8.9150,[292]8.9127,[293]8.9134,[294]8.9315,[295]8.9453,[296]8.9544,[297]8.9788,[298]8.9740,[299]8.9609,[300]8.9602,[301]8.9519,[302]8.9513,[303]8.9436,[304]8.9686,[305]8.9754,[306]8.9706,[307]8.9705,[308]8.9670,[309]8.9670,[310]8.9750,[311]8.9739,[312]8.9634,[313]8.9605,[314]8.9673,[315]8.9490,[316]8.9543,[317]8.9738,[318]8.9778,[319]8.9710,[320]8.9783,[321]8.9657,[322]8.9767,[323]8.9911,[324]9.0044,[325]9.0241,[326]9.0233,[327]9.0116,[328]9.0126,[329]8.9956,[330]8.9865,[331]8.9784,[332]8.9760,[333]8.9773,[334]8.9668,[335]8.9534,[336]8.9551,[337]8.9634,[338]8.9728,[339]8.9703,[340]8.9615,[341]8.9512,[342]8.9518,[343]8.9484,[344]8.9574,[345]8.9663,[346]8.9629,[347]8.9510,[348]8.9517,[349]8.9491,[350]8.9381,[351]8.9439,[352]8.9486,[353]8.9470,[354]8.9363,[355]8.9537,[356]8.9611,[357]8.9631,[358]8.9560,[359]8.9612,[360]8.9600,[361]8.9684,[362]8.9601,[363]8.9548,[364]8.9651,[365]8.9822,[366]9.0100,[367]9.0305,[368]9.0611,[369]9.0787,[370]9.0981,[371]9.1247,[372]9.1465,[373]9.1568,[374]9.1663,[375]9.1880,[376]9.2017,[377]9.2114,[378]9.2245,[379]9.2360,[380]9.2549,[381]9.2723,[382]9.2864,[383]9.2972,[384]9.3097,[385]9.3384,[386]9.3628,[387]9.3637,[388]9.3657,[389]9.3745,[390]9.3991,[391]9.4208,[392]9.4141,[393]9.4165,[394]9.4058,[395]9.4061,[396]9.4136,[397]9.4203,[398]9.4247,[399]9.4316,[400]9.4446,[401]9.4478,[402]9.4474,[403]9.4377,[404]9.4197,[405]9.4092,[406]9.4086,[407]9.4157,[408]9.4268,[409]9.4266,[410]9.4338,[411]9.4525,[412]9.4577,[413]9.4600,[414]9.4581,[415]9.4494,[416]9.4452,[417]9.4516,[418]9.4595,[419]9.4641,[420]9.4646,[421]9.4723,[422]9.4580,[423]9.4582,[424]9.4620,[425]9.4665,[426]9.4707,[427]9.4815,[428]9.4978,[429]9.5027,[430]9.4954,[431]9.4906,[432]9.4944,[433]9.4962,[434]9.4958,[435]9.5062,[436]9.4942,[437]9.5001,[438]9.5003,[439]9.4923,[440]9.4980,[441]9.4970,[442]9.4922,[443]9.4856,[444]9.4883,[445]9.4772,[446]9.4805,[447]9.4758,[448]9.4687,[449]9.4638,[450]9.4710,[451]9.4704,[452]9.4603,[453]9.4529,[454]9.4500,[455]9.4543,[456]9.4531,[457]9.4565,[458]9.4741,[459]9.4698,[460]9.4684,[461]9.4657,[462]9.4653,[463]9.4783,[464]9.4795,[465]9.4800,[466]9.4820,[467]9.4880,[468]9.4943,[469]9.4995,[470]9.5061,[471]9.4974,[472]9.5092,[473]9.5011,[474]9.4997,[475]9.5054,[476]9.5069,[477]9.4980,[478]9.4825,[479]9.4838,[480]9.4895,[481]9.4934,[482]9.4811,[483]9.4902,[484]9.4982,[485]9.5026,[486]9.5023,[487]9.5077,[488]9.5005,[489]9.4890,[490]9.4850,[491]9.4755,[492]9.4755,[493]9.4620,[494]9.4589,[495]9.4515,[496]9.4471,[497]9.4608,[498]9.4675,[499]9.4613,[500]9.4632,[501]9.4662,[502]9.4645,[503]9.4800,[504]9.4852,[505]9.4893,[506]9.4848,[507]9.4786,[508]9.4811,[509]9.4758,[510]9.4739,[511]9.4779,[512]9.4752,[513]9.4775,[514]9.4806,[515]9.4797,[516]9.4806,[517]9.4816,[518]9.4748,[519]9.4730,[520]9.4733,[521]9.4747,[522]9.4639,[523]9.4643,[524]9.4612,[525]9.4637,[526]9.4678,[527]9.4703,[528]9.4684,[529]9.4617,[530]9.4559,[531]9.4609,[532]9.4578,[533]9.4567,[534]9.4547,[535]9.4553,[536]9.4500,[537]9.4607,[538]9.4693,[539]9.4668,[540]9.4802,[541]9.4855,[542]9.4765,[543]9.4782,[544]9.4843,[545]9.4799,[546]9.4703,[547]9.4596,[548]9.4436,[549]9.4466,[550]9.4284,[551]9.4144,[552]9.4018,[553]9.3688,[554]9.3697,[555]9.3733,[556]9.3747,[557]9.3758,[558]9.3750,[559]9.3833,[560]9.3911,[561]9.3988,[562]9.4125,[563]9.4211,[564]9.4178,[565]9.4292,[566]9.4316,[567]9.4214,[568]9.4142,[569]9.4067,[570]9.4089,[571]9.4102,[572]9.4169,[573]9.4191,[574]9.4214,[575]9.4213,[576]9.4305,[577]9.4243,[578]9.4290,[579]9.4350,[580]9.4504,[581]9.4521,[582]9.4674,[583]9.4524,[584]9.4488,
|
| 168 |
+
llama_print_timings: load time = 4893.63 ms
|
| 169 |
+
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 170 |
+
llama_print_timings: prompt eval time = 442686.42 ms / 299008 tokens ( 1.48 ms per token, 675.44 tokens per second)
|
| 171 |
+
llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
|
| 172 |
+
llama_print_timings: total time = 454271.01 ms / 299009 tokens
|
| 173 |
+
|
| 174 |
+
Final estimate: PPL over 584 chunks for n_ctx=512 = 9.4488 +/- 0.07565
|
logs/quantize-Qwen3-Coder-Next-IQ4_KSS.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
logs/quantize-Qwen3-Coder-Next-Q8_0.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
logs/quantize-Qwen3-Coder-Next-smol-IQ2_KS.log
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|