ubergarm commited on
Commit
57fc5bc
·
0 Parent(s):

initial commit

Browse files
.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ imatrix-*.dat filter=lfs diff=lfs merge=lfs -text
37
+ *.gguf filter=lfs diff=lfs merge=lfs -text
38
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ quantized_by: ubergarm
3
+ pipeline_tag: text-generation
4
+ base_model: Qwen/Qwen3-Coder-Next
5
+ base_model_relation: quantized
6
+ license: apache-2.0
7
+ tags:
8
+ - imatrix
9
+ - conversational
10
+ - qwen3_next
11
+ - ik_llama.cpp
12
+ ---
13
+
14
+ ## `ik_llama.cpp` imatrix Quantizations of Qwen/Qwen3-Coder-Next
15
+ *NOTE* `ik_llama.cpp` can also run your existing GGUFs from bartowski, unsloth, mradermacher, etc if you want to try it out before downloading my quants.
16
+
17
+ Some of ik's new quants are supported with [Nexesenex/croco.cpp](https://github.com/Nexesenex/croco.cpp) fork of KoboldCPP with Windows builds. Also check for [ik_llama.cpp windows builds by Thireus here.](https://github.com/Thireus/ik_llama.cpp/releases).
18
+
19
+ These quants provide best in class perplexity for the given memory footprint.
20
+
21
+ ## Big Thanks
22
+ Shout out to Wendell and the **Level1Techs** crew, the community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs)! **BIG thanks** for providing **BIG hardware** expertise and access to run these experiments and make these great quants available to the community!!!
23
+
24
+ Also thanks to all the folks in the quanting and inferencing community on [BeaverAI Club Discord](https://huggingface.co/BeaverAI) and on [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/) for tips and tricks helping each other run, test, and benchmark all the fun new models! Thanks to huggingface for hosting all these big quants!
25
+
26
+ Finally, I *really* appreciate the support from [aifoundry.org](https://aifoundry.org) so check out their open source RISC-V based solutions!
27
+
28
+ ## Quant Collection
29
+ Perplexity computed against *wiki.test.raw*. (lower is "better")
30
+
31
+ ![Perplexity Chart](images/perplexity.png "Chart showing Perplexity vs Model Size.")
32
+
33
+ These two are just test quants for baseline perplexity comparison and not available for download here:
34
+ * `BF16` 148.502 GiB (16.010 BPW)
35
+ - TODO
36
+ * `Q8_0` 78.982 GiB (8.515 BPW)
37
+ - PPL over 584 chunks for n_ctx=512 = 8.2239 +/- 0.06389
38
+
39
+ *NOTE*: The first split file is much smaller on purpose to only contain metadata, its fine!
40
+
41
+ ## IQ4_KSS 39.377 GiB (4.245 BPW)
42
+ PPL over 584 chunks for n_ctx=512 = 8.3069 +/- 0.06459
43
+
44
+ <details>
45
+
46
+ <summary>👈 Secret Recipe</summary>
47
+
48
+ ```bash
49
+ #!/usr/bin/env bash
50
+
51
+ custom="
52
+ # 60 Repeating Layers [0-59]
53
+
54
+ ## Gated Attention/Delta Net [Blended 0-59]
55
+ blk\..*\.attn_gate\.weight=q8_0
56
+ blk\..*\.attn_qkv\.weight=q8_0
57
+ blk\..*\.attn_output\.weight=q8_0
58
+ blk\..*\.attn_q\.weight=q8_0
59
+ blk\..*\.attn_k\.weight=q8_0
60
+ blk\..*\.attn_v\.weight=q8_0
61
+ blk\..*\.ssm_ba\.weight=q8_0
62
+ blk\..*\.ssm_out\.weight=q8_0
63
+
64
+ # Shared Expert Layers [0-59]
65
+ blk\..*\.ffn_down_shexp\.weight=q8_0
66
+ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
67
+
68
+ # Routed Experts Layers [0-59]
69
+ blk\..*\.ffn_down_exps\.weight=iq4_ks
70
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq4_kss
71
+
72
+ # Non-Repeating Layers
73
+ token_embd\.weight=iq6_k
74
+ output\.weight=iq6_k
75
+ "
76
+
77
+ custom=$(
78
+ echo "$custom" | grep -v '^#' | \
79
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
80
+ )
81
+
82
+ numactl -N ${SOCKET} -m ${SOCKET} \
83
+ ./build/bin/llama-quantize \
84
+ --dry-run \
85
+ --custom-q "$custom" \
86
+ --imatrix /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/imatrix-Qwen3-Coder-Next-BF16.dat \
87
+ /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf \
88
+ /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf \
89
+ IQ4_KSS \
90
+ 128
91
+ ```
92
+
93
+ </details>
94
+
95
+ ## smol-IQ2_KS 22.097 GiB (2.382 BPW)
96
+ PPL over 584 chunks for n_ctx=512 = 9.4488 +/- 0.07565
97
+
98
+ <details>
99
+
100
+ <summary>👈 Secret Recipe</summary>
101
+
102
+ ```bash
103
+ #!/usr/bin/env bash
104
+
105
+ custom="
106
+ # 60 Repeating Layers [0-59]
107
+
108
+ ## Gated Attention/Delta Net [Blended 0-59]
109
+ blk\..*\.attn_gate\.weight=q8_0
110
+ blk\..*\.attn_qkv\.weight=q8_0
111
+ blk\..*\.attn_output\.weight=q8_0
112
+ blk\..*\.attn_q\.weight=q8_0
113
+ blk\..*\.attn_k\.weight=q8_0
114
+ blk\..*\.attn_v\.weight=q8_0
115
+ blk\..*\.ssm_ba\.weight=q8_0
116
+ blk\..*\.ssm_out\.weight=q8_0
117
+
118
+ # Shared Expert Layers [0-59]
119
+ blk\..*\.ffn_down_shexp\.weight=q8_0
120
+ blk\..*\.ffn_(gate|up)_shexp\.weight=q8_0
121
+
122
+ # Routed Experts Layers [0-59]
123
+ blk\..*\.ffn_down_exps\.weight=iq2_ks
124
+ blk\..*\.ffn_(gate|up)_exps\.weight=iq2_ks
125
+
126
+ # Non-Repeating Layers
127
+ token_embd\.weight=iq4_k
128
+ output\.weight=iq6_k
129
+ "
130
+
131
+ custom=$(
132
+ echo "$custom" | grep -v '^#' | \
133
+ sed -Ez 's:\n+:,:g;s:,$::;s:^,::'
134
+ )
135
+
136
+ numactl -N ${SOCKET} -m ${SOCKET} \
137
+ ./build/bin/llama-quantize \
138
+ --dry-run \
139
+ --custom-q "$custom" \
140
+ --imatrix /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/imatrix-Qwen3-Coder-Next-BF16.dat \
141
+ /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf \
142
+ /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf \
143
+ IQ2_KS \
144
+ 128
145
+ ```
146
+
147
+ </details>
148
+
149
+ ## Quick Start
150
+ Check some recent model cards for examples on running models.
151
+
152
+ ```bash
153
+ # Clone and checkout
154
+ $ git clone https://github.com/ikawrakow/ik_llama.cpp
155
+ $ cd ik_llama.cpp
156
+
157
+ # Build for hybrid CPU+CUDA
158
+ $ cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
159
+ $ cmake --build build --config Release -j $(nproc)
160
+
161
+ # Download Desired Quants
162
+ $ pip install huggingface_hub
163
+ $ hf download --local-dir ./ --include=smol-IQ2_XS/*.gguf ubergarm/Qwen3-Coder-Next-GGUF
164
+
165
+ # Full GPU offload
166
+ # For 2 or more GPUs keep an eye on `-sm graph` support:
167
+ # https://github.com/ikawrakow/ik_llama.cpp/pull/1292
168
+ CUDA_VISIBLE_DEVICES="0,1" \
169
+ ./build/bin/llama-server \
170
+ --model "$model" \
171
+ --alias Qwen3-Coder-Next \
172
+ -c 262144 \
173
+ -fa on \
174
+ -ger \
175
+ --merge-qkv \
176
+ -sm graph \
177
+ -ngl 99 \
178
+ -ub 2048 -b 2048 \
179
+ --threads 1 \
180
+ --host 127.0.0.1 \
181
+ --port 8080 \
182
+ --jinja \
183
+ --no-mmap
184
+
185
+ # Hybrid CPU+GPU
186
+ # basically use --n-cpu-moe etc...
187
+ echo TODO
188
+
189
+ # CPU-Only
190
+ # Gated delta net CPU-only performance seems slower than other architechtures, ideally have at least 1x GPU for attn/kv-cache
191
+ numactl -N "$SOCKET" -m "$SOCKET" \
192
+ ./build/bin/llama-server \
193
+ --model "$model"\
194
+ --alias Qwen3-Coder-Next \
195
+ --ctx-size 131072 \
196
+ -ger \
197
+ --merge-qkv \
198
+ -ctk q8_0 -ctv q8_0 \
199
+ -ub 4096 -b 4096 \
200
+ --parallel 1 \
201
+ --threads 96 \
202
+ --threads-batch 128 \
203
+ --numa numactl \
204
+ --host 127.0.0.1 \
205
+ --port 8080 \
206
+ --no-mmap \
207
+ --jinja
208
+ ```
209
+
210
+ ## References
211
+ * [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp)
212
+ * [ubergarm on quantizing LLMs and tuning GPUs with aifoundry.org](https://blog.aifoundry.org/p/adventures-in-model-quantization)
213
+ * [ubergarm-imatrix-calibration-corpus-v02.txt](https://gist.github.com/ubergarm/edfeb3ff9c6ec8b49e88cdf627b0711a?permalink_comment_id=5682584#gistcomment-5682584)
214
+ * [Getting Started Guide (out of date)](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
215
+ * [Quant Cookers Guide (out of date)](https://github.com/ikawrakow/ik_llama.cpp/discussions/434)
216
+ * [ik_llama.cpp Qwen3Next Issue](https://github.com/ikawrakow/ik_llama.cpp/issues/1229)
images/perplexity.png ADDED

Git LFS Details

  • SHA256: bbb389ceff9d5020ab36d9e0507888bd1a7c9001ec675635eeea81dc48fc6efb
  • Pointer size: 131 Bytes
  • Size of remote file: 145 kB
logs/imatrix-Qwen3-Coder-Next-BF16.log ADDED
The diff for this file is too large to render. See raw diff
 
logs/perplexity-Qwen3-Coder-Next-IQ4_KSS.log ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SOCKET is set to: 0
2
+ main: build = 4211 (b2cb4512)
3
+ main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
4
+ main: seed = 1337
5
+ CPU: using device CPU - 0 MiB free
6
+ llama_model_loader: loaded meta data with 47 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf (version GGUF V3 (latest))
7
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
8
+ llama_model_loader: - kv 0: general.architecture str = qwen3next
9
+ llama_model_loader: - kv 1: general.type str = model
10
+ llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
11
+ llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
12
+ llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
13
+ llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
14
+ llama_model_loader: - kv 6: general.size_label str = 512x2.5B
15
+ llama_model_loader: - kv 7: general.license str = apache-2.0
16
+ llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
17
+ llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
18
+ llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
19
+ llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
20
+ llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
21
+ llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
22
+ llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
23
+ llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
24
+ llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
25
+ llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
26
+ llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
27
+ llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
28
+ llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
29
+ llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
30
+ llama_model_loader: - kv 22: general.file_type u32 = 148
31
+ llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
32
+ llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
33
+ llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
34
+ llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
35
+ llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
36
+ llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
37
+ llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
38
+ llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
39
+ llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
40
+ llama_model_loader: - kv 32: general.quantization_version u32 = 2
41
+ llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
42
+ llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
43
+ llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
44
+ llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
45
+ llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
46
+ llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
47
+ llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
48
+ llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
49
+ llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
50
+ llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
51
+ llama_model_loader: - kv 43: quantize.imatrix.file str = /mnt/data/models/ubergarm/Qwen3-Coder...
52
+ llama_model_loader: - kv 44: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
53
+ llama_model_loader: - kv 45: quantize.imatrix.entries_count i32 = 577
54
+ llama_model_loader: - kv 46: quantize.imatrix.chunks_count i32 = 840
55
+ llama_model_loader: - type f32: 361 tensors
56
+ llama_model_loader: - type q8_0: 336 tensors
57
+ llama_model_loader: - type iq6_k: 2 tensors
58
+ llama_model_loader: - type iq4_ks: 48 tensors
59
+ llama_model_loader: - type iq4_kss: 96 tensors
60
+ load: printing all EOG tokens:
61
+ load: - 151643 ('<|endoftext|>')
62
+ load: - 151645 ('<|im_end|>')
63
+ load: - 151662 ('<|fim_pad|>')
64
+ load: - 151663 ('<|repo_name|>')
65
+ load: - 151664 ('<|file_sep|>')
66
+ load: special tokens cache size = 26
67
+ load: token to piece cache size = 0.9311 MB
68
+ llm_load_print_meta: format = GGUF V3 (latest)
69
+ llm_load_print_meta: arch = qwen3next
70
+ llm_load_print_meta: n_ctx_train = 262144
71
+ llm_load_print_meta: n_embd = 2048
72
+ llm_load_print_meta: n_layer = 48
73
+ llm_load_print_meta: n_head = 16
74
+ llm_load_print_meta: n_head_kv = 2
75
+ llm_load_print_meta: n_rot = 64
76
+ llm_load_print_meta: n_swa = 0
77
+ llm_load_print_meta: n_swa_pattern = 1
78
+ llm_load_print_meta: n_embd_head_k = 256
79
+ llm_load_print_meta: n_embd_head_v = 256
80
+ llm_load_print_meta: n_gqa = 8
81
+ llm_load_print_meta: n_embd_k_gqa = 512
82
+ llm_load_print_meta: n_embd_v_gqa = 512
83
+ llm_load_print_meta: f_norm_eps = 0.0e+00
84
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
85
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
86
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
87
+ llm_load_print_meta: f_logit_scale = 0.0e+00
88
+ llm_load_print_meta: n_ff = 5120
89
+ llm_load_print_meta: n_expert = 512
90
+ llm_load_print_meta: n_expert_used = 10
91
+ llm_load_print_meta: causal attn = 1
92
+ llm_load_print_meta: pooling type = 0
93
+ llm_load_print_meta: rope type = 2
94
+ llm_load_print_meta: rope scaling = linear
95
+ llm_load_print_meta: freq_base_train = 5000000.0
96
+ llm_load_print_meta: freq_scale_train = 1
97
+ llm_load_print_meta: n_ctx_orig_yarn = 262144
98
+ llm_load_print_meta: rope_finetuned = unknown
99
+ llm_load_print_meta: ssm_d_conv = 4
100
+ llm_load_print_meta: ssm_d_inner = 4096
101
+ llm_load_print_meta: ssm_d_state = 128
102
+ llm_load_print_meta: ssm_dt_rank = 32
103
+ llm_load_print_meta: model type = 80B.A3B
104
+ llm_load_print_meta: model ftype = IQ4_KSS - 4.0 bpw
105
+ llm_load_print_meta: model params = 79.674 B
106
+ llm_load_print_meta: model size = 39.377 GiB (4.245 BPW)
107
+ llm_load_print_meta: repeating layers = 38.897 GiB (4.227 BPW, 79.052 B parameters)
108
+ llm_load_print_meta: general.name = Qwen3 Coder Next
109
+ print_info: vocab type = BPE
110
+ print_info: n_vocab = 151936
111
+ print_info: n_merges = 151387
112
+ print_info: BOS token = 151643 '<|endoftext|>'
113
+ print_info: EOS token = 151645 '<|im_end|>'
114
+ print_info: EOT token = 151645 '<|im_end|>'
115
+ print_info: PAD token = 151643 '<|endoftext|>'
116
+ print_info: LF token = 198 'Ċ'
117
+ print_info: FIM PRE token = 151659 '<|fim_prefix|>'
118
+ print_info: FIM SUF token = 151661 '<|fim_suffix|>'
119
+ print_info: FIM MID token = 151660 '<|fim_middle|>'
120
+ print_info: FIM PAD token = 151662 '<|fim_pad|>'
121
+ print_info: FIM REP token = 151663 '<|repo_name|>'
122
+ print_info: FIM SEP token = 151664 '<|file_sep|>'
123
+ print_info: EOG token = 151643 '<|endoftext|>'
124
+ print_info: EOG token = 151645 '<|im_end|>'
125
+ print_info: EOG token = 151662 '<|fim_pad|>'
126
+ print_info: EOG token = 151663 '<|repo_name|>'
127
+ print_info: EOG token = 151664 '<|file_sep|>'
128
+ print_info: max token length = 256
129
+ llm_load_tensors: ggml ctx size = 0.35 MiB
130
+ llm_load_tensors: offloading 0 repeating layers to GPU
131
+ llm_load_tensors: offloaded 0/49 layers to GPU
132
+ llm_load_tensors: CPU buffer size = 40322.46 MiB
133
+ ....................................................................................................
134
+ llama_init_from_model: n_ctx = 2048
135
+ llama_init_from_model: n_batch = 2048
136
+ llama_init_from_model: n_ubatch = 512
137
+ llama_init_from_model: flash_attn = 1
138
+ llama_init_from_model: attn_max_b = 0
139
+ llama_init_from_model: fused_moe = 1
140
+ llama_init_from_model: grouped er = 0
141
+ llama_init_from_model: fused_up_gate = 1
142
+ llama_init_from_model: fused_mmad = 1
143
+ llama_init_from_model: rope_cache = 0
144
+ llama_init_from_model: graph_reuse = 1
145
+ llama_init_from_model: k_cache_hadam = 0
146
+ llama_init_from_model: split_mode_graph_scheduling = 0
147
+ llama_init_from_model: reduce_type = f16
148
+ llama_init_from_model: sched_async = 0
149
+ llama_init_from_model: ser = -1, 0
150
+ llama_init_from_model: freq_base = 5000000.0
151
+ llama_init_from_model: freq_scale = 1
152
+ llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
153
+ llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
154
+ llama_init_from_model: CPU output buffer size = 2.32 MiB
155
+ llama_init_from_model: CPU compute buffer size = 300.75 MiB
156
+ llama_init_from_model: graph nodes = 12382
157
+ llama_init_from_model: graph splits = 1
158
+ llama_init_from_model: enabling only_active_experts scheduling
159
+
160
+ system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
161
+ perplexity: tokenizing the input ..
162
+ perplexity: tokenization took 390.895 ms
163
+ perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
164
+ perplexity: 3.57 seconds per pass - ETA 8.67 minutes
165
+ ===================================== llama_init_from_model: f16
166
+ ======================================= HAVE_FANCY_SIMD is defined
167
+ [1]4.7708,[2]6.6455,[3]5.6847,[4]4.9300,[5]4.8482,[6]4.9616,[7]5.0382,[8]5.1594,[9]5.1162,[10]5.1752,[11]5.1005,[12]5.3326,[13]5.7391,[14]5.6986,[15]5.8049,[16]6.1672,[17]5.9825,[18]6.1662,[19]6.2517,[20]6.2809,[21]6.1879,[22]6.3014,[23]6.1039,[24]5.8551,[25]5.7720,[26]5.6545,[27]5.5737,[28]5.5205,[29]5.5937,[30]5.5868,[31]5.5772,[32]5.6498,[33]5.5949,[34]5.6422,[35]5.7332,[36]5.8198,[37]5.9514,[38]6.0595,[39]6.0941,[40]6.1921,[41]6.2363,[42]6.2663,[43]6.3373,[44]6.3414,[45]6.3754,[46]6.4259,[47]6.5950,[48]6.7029,[49]6.7117,[50]6.7649,[51]6.8072,[52]6.8791,[53]6.9402,[54]6.9868,[55]7.0063,[56]7.0823,[57]7.0897,[58]7.1329,[59]7.1791,[60]7.2232,[61]7.2709,[62]7.3073,[63]7.3673,[64]7.4293,[65]7.4972,[66]7.5685,[67]7.6299,[68]7.6186,[69]7.6415,[70]7.6493,[71]7.6847,[72]7.7552,[73]7.8107,[74]7.8395,[75]7.8152,[76]7.8270,[77]7.8976,[78]7.9346,[79]7.8476,[80]7.8241,[81]7.8209,[82]7.8591,[83]7.8315,[84]7.8261,[85]7.8472,[86]7.9303,[87]7.9743,[88]7.9955,[89]8.0109,[90]8.0011,[91]8.0545,[92]8.0347,[93]8.0797,[94]8.0940,[95]8.0805,[96]8.0713,[97]8.0645,[98]8.1005,[99]8.0796,[100]8.1641,[101]8.2137,[102]8.2089,[103]8.2180,[104]8.2042,[105]8.2038,[106]8.2012,[107]8.2370,[108]8.2704,[109]8.3100,[110]8.3700,[111]8.4787,[112]8.4877,[113]8.4508,[114]8.5037,[115]8.5319,[116]8.4806,[117]8.4850,[118]8.4776,[119]8.4453,[120]8.4664,[121]8.4540,[122]8.4422,[123]8.4035,[124]8.3625,[125]8.3447,[126]8.3292,[127]8.2807,[128]8.2685,[129]8.2353,[130]8.1914,[131]8.1579,[132]8.1311,[133]8.1250,[134]8.1375,[135]8.1307,[136]8.1300,[137]8.1008,[138]8.0728,[139]8.0903,[140]8.0761,[141]8.0732,[142]8.0946,[143]8.1000,[144]8.1349,[145]8.1126,[146]8.0758,[147]8.0396,[148]8.0002,[149]7.9802,[150]7.9379,[151]7.9272,[152]7.9191,[153]7.9154,[154]7.8789,[155]7.8823,[156]7.8432,[157]7.8223,[158]7.7976,[159]7.7800,[160]7.7416,[161]7.7246,[162]7.7171,[163]7.6999,[164]7.7098,[165]7.7018,[166]7.6948,[167]7.6935,[168]7.7147,[169]7.7185,[170]7.7491,[171]7.7541,[172]7.7784,[173]7.8294,[174]7.8438,[175]7.8979,[176]7.9243,[177]7.9767,[178]8.0162,[179]8.0179,[180]7.9923,[181]7.9633,[182]7.9752,[183]7.9436,[184]7.9282,[185]7.9042,[186]7.8789,[187]7.8603,[188]7.8537,[189]7.8686,[190]7.8953,[191]7.9074,[192]7.9170,[193]7.9192,[194]7.9375,[195]7.9540,[196]7.9615,[197]7.9671,[198]7.9489,[199]7.9394,[200]7.9261,[201]7.9266,[202]7.9436,[203]7.9686,[204]7.9891,[205]8.0084,[206]8.0125,[207]8.0387,[208]8.0273,[209]8.0280,[210]8.0243,[211]8.0272,[212]8.0306,[213]8.0277,[214]8.0158,[215]8.0004,[216]7.9936,[217]8.0013,[218]7.9968,[219]7.9762,[220]7.9453,[221]7.9323,[222]7.9186,[223]7.9170,[224]7.9242,[225]7.9042,[226]7.8961,[227]7.8846,[228]7.8597,[229]7.8333,[230]7.8165,[231]7.7974,[232]7.7846,[233]7.7802,[234]7.7780,[235]7.7770,[236]7.7625,[237]7.7533,[238]7.7389,[239]7.7335,[240]7.7430,[241]7.7529,[242]7.7647,[243]7.7618,[244]7.7762,[245]7.7791,[246]7.8023,[247]7.8114,[248]7.8167,[249]7.8264,[250]7.8289,[251]7.8477,[252]7.8657,[253]7.9007,[254]7.9252,[255]7.9294,[256]7.9466,[257]7.9614,[258]7.9483,[259]7.9333,[260]7.9183,[261]7.8962,[262]7.8837,[263]7.8777,[264]7.8748,[265]7.8830,[266]7.8881,[267]7.8878,[268]7.8785,[269]7.8835,[270]7.8793,[271]7.8743,[272]7.8703,[273]7.8680,[274]7.8638,[275]7.8589,[276]7.8453,[277]7.8457,[278]7.8447,[279]7.8363,[280]7.8316,[281]7.8266,[282]7.8243,[283]7.8001,[284]7.7717,[285]7.7814,[286]7.7652,[287]7.7490,[288]7.7467,[289]7.7427,[290]7.7647,[291]7.7694,[292]7.7681,[293]7.7700,[294]7.7871,[295]7.7983,[296]7.8088,[297]7.8317,[298]7.8294,[299]7.8206,[300]7.8214,[301]7.8153,[302]7.8173,[303]7.8125,[304]7.8376,[305]7.8428,[306]7.8415,[307]7.8455,[308]7.8452,[309]7.8442,[310]7.8499,[311]7.8534,[312]7.8437,[313]7.8385,[314]7.8450,[315]7.8327,[316]7.8349,[317]7.8499,[318]7.8570,[319]7.8505,[320]7.8528,[321]7.8423,[322]7.8527,[323]7.8618,[324]7.8682,[325]7.8885,[326]7.8866,[327]7.8754,[328]7.8789,[329]7.8652,[330]7.8568,[331]7.8508,[332]7.8505,[333]7.8524,[334]7.8493,[335]7.8398,[336]7.8421,[337]7.8490,[338]7.8612,[339]7.8580,[340]7.8533,[341]7.8457,[342]7.8455,[343]7.8444,[344]7.8513,[345]7.8598,[346]7.8563,[347]7.8430,[348]7.8450,[349]7.8427,[350]7.8326,[351]7.8323,[352]7.8360,[353]7.8358,[354]7.8259,[355]7.8389,[356]7.8488,[357]7.8537,[358]7.8448,[359]7.8493,[360]7.8489,[361]7.8587,[362]7.8503,[363]7.8448,[364]7.8520,[365]7.8703,[366]7.8962,[367]7.9124,[368]7.9423,[369]7.9580,[370]7.9734,[371]7.9971,[372]8.0169,[373]8.0274,[374]8.0364,[375]8.0557,[376]8.0694,[377]8.0815,[378]8.0946,[379]8.1071,[380]8.1237,[381]8.1408,[382]8.1519,[383]8.1605,[384]8.1724,[385]8.1982,[386]8.2193,[387]8.2191,[388]8.2206,[389]8.2297,[390]8.2537,[391]8.2719,[392]8.2657,[393]8.2648,[394]8.2577,[395]8.2585,[396]8.2668,[397]8.2752,[398]8.2817,[399]8.2895,[400]8.3013,[401]8.3027,[402]8.3024,[403]8.2938,[404]8.2712,[405]8.2583,[406]8.2576,[407]8.2657,[408]8.2750,[409]8.2769,[410]8.2870,[411]8.3046,[412]8.3100,[413]8.3086,[414]8.3064,[415]8.3013,[416]8.2942,[417]8.2989,[418]8.3078,[419]8.3119,[420]8.3128,[421]8.3198,[422]8.3088,[423]8.3079,[424]8.3108,[425]8.3141,[426]8.3153,[427]8.3221,[428]8.3369,[429]8.3446,[430]8.3405,[431]8.3363,[432]8.3409,[433]8.3446,[434]8.3456,[435]8.3543,[436]8.3482,[437]8.3534,[438]8.3553,[439]8.3501,[440]8.3548,[441]8.3546,[442]8.3523,[443]8.3448,[444]8.3471,[445]8.3381,[446]8.3400,[447]8.3342,[448]8.3285,[449]8.3229,[450]8.3289,[451]8.3287,[452]8.3163,[453]8.3074,[454]8.3043,[455]8.3097,[456]8.3081,[457]8.3134,[458]8.3283,[459]8.3248,[460]8.3247,[461]8.3227,[462]8.3213,[463]8.3326,[464]8.3318,[465]8.3329,[466]8.3351,[467]8.3406,[468]8.3455,[469]8.3505,[470]8.3561,[471]8.3456,[472]8.3546,[473]8.3439,[474]8.3426,[475]8.3478,[476]8.3458,[477]8.3364,[478]8.3214,[479]8.3242,[480]8.3318,[481]8.3356,[482]8.3254,[483]8.3338,[484]8.3415,[485]8.3458,[486]8.3452,[487]8.3507,[488]8.3452,[489]8.3343,[490]8.3328,[491]8.3271,[492]8.3271,[493]8.3183,[494]8.3167,[495]8.3110,[496]8.3078,[497]8.3199,[498]8.3263,[499]8.3184,[500]8.3181,[501]8.3182,[502]8.3161,[503]8.3296,[504]8.3328,[505]8.3363,[506]8.3338,[507]8.3307,[508]8.3350,[509]8.3319,[510]8.3307,[511]8.3334,[512]8.3293,[513]8.3317,[514]8.3353,[515]8.3350,[516]8.3378,[517]8.3407,[518]8.3343,[519]8.3345,[520]8.3369,[521]8.3388,[522]8.3295,[523]8.3290,[524]8.3263,[525]8.3301,[526]8.3357,[527]8.3384,[528]8.3378,[529]8.3318,[530]8.3280,[531]8.3313,[532]8.3288,[533]8.3279,[534]8.3279,[535]8.3294,[536]8.3227,[537]8.3285,[538]8.3369,[539]8.3337,[540]8.3459,[541]8.3483,[542]8.3428,[543]8.3452,[544]8.3522,[545]8.3481,[546]8.3405,[547]8.3328,[548]8.3173,[549]8.3178,[550]8.3012,[551]8.2903,[552]8.2809,[553]8.2540,[554]8.2528,[555]8.2562,[556]8.2567,[557]8.2595,[558]8.2589,[559]8.2656,[560]8.2721,[561]8.2812,[562]8.2938,[563]8.3020,[564]8.3000,[565]8.3091,[566]8.3090,[567]8.2960,[568]8.2876,[569]8.2842,[570]8.2838,[571]8.2835,[572]8.2867,[573]8.2874,[574]8.2889,[575]8.2885,[576]8.2944,[577]8.2890,[578]8.2944,[579]8.3000,[580]8.3142,[581]8.3155,[582]8.3276,[583]8.3126,[584]8.3069,
168
+ llama_print_timings: load time = 8105.40 ms
169
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
170
+ llama_print_timings: prompt eval time = 447048.08 ms / 299008 tokens ( 1.50 ms per token, 668.85 tokens per second)
171
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
172
+ llama_print_timings: total time = 458500.62 ms / 299009 tokens
173
+
174
+ Final estimate: PPL over 584 chunks for n_ctx=512 = 8.3069 +/- 0.06459
logs/perplexity-Qwen3-Coder-Next-Q8_0.log ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+
3
+ model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-512x2.5B-BF16-00001-of-00004.gguf
4
+ #model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-Q8_0.gguf
5
+ #model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-IQ4_KSS.gguf
6
+ #model=/mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf
7
+
8
+ numactl -N "$SOCKET" -m "$SOCKET" \
9
+ ./build/bin/llama-perplexity \
10
+ -m "$model" \
11
+ -f wiki.test.raw \
12
+ --seed 1337 \
13
+ --ctx-size 512 \
14
+ -ub 512 -b 2048 \
15
+ --validate-quants \
16
+ --no-mmap \
17
+ --numa numactl \
18
+ --threads 96 \
19
+ --threads-batch 128
20
+
21
+ SOCKET is set to: 1
22
+ main: build = 4211 (b2cb4512)
23
+ main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
24
+ main: seed = 1337
25
+ CPU: using device CPU - 0 MiB free
26
+ llama_model_loader: loaded meta data with 43 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3.5-Coder-Next-Q8_0.gguf (version GGUF V3 (latest))
27
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
28
+ llama_model_loader: - kv 0: general.architecture str = qwen3next
29
+ llama_model_loader: - kv 1: general.type str = model
30
+ llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
31
+ llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
32
+ llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
33
+ llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
34
+ llama_model_loader: - kv 6: general.size_label str = 512x2.5B
35
+ llama_model_loader: - kv 7: general.license str = apache-2.0
36
+ llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
37
+ llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
38
+ llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
39
+ llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
40
+ llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
41
+ llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
42
+ llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
43
+ llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
44
+ llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
45
+ llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
46
+ llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
47
+ llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
48
+ llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
49
+ llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
50
+ llama_model_loader: - kv 22: general.file_type u32 = 7
51
+ llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
52
+ llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
53
+ llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
54
+ llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
55
+ llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
56
+ llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
57
+ llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
58
+ llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
59
+ llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
60
+ llama_model_loader: - kv 32: general.quantization_version u32 = 2
61
+ llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
62
+ llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
63
+ llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
64
+ llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
65
+ llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
66
+ llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
67
+ llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
68
+ llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
69
+ llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
70
+ llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
71
+ llama_model_loader: - type f32: 361 tensors
72
+ llama_model_loader: - type q8_0: 482 tensors
73
+ load: printing all EOG tokens:
74
+ load: - 151643 ('<|endoftext|>')
75
+ load: - 151645 ('<|im_end|>')
76
+ load: - 151662 ('<|fim_pad|>')
77
+ load: - 151663 ('<|repo_name|>')
78
+ load: - 151664 ('<|file_sep|>')
79
+ load: special tokens cache size = 26
80
+ load: token to piece cache size = 0.9311 MB
81
+ llm_load_print_meta: format = GGUF V3 (latest)
82
+ llm_load_print_meta: arch = qwen3next
83
+ llm_load_print_meta: n_ctx_train = 262144
84
+ llm_load_print_meta: n_embd = 2048
85
+ llm_load_print_meta: n_layer = 48
86
+ llm_load_print_meta: n_head = 16
87
+ llm_load_print_meta: n_head_kv = 2
88
+ llm_load_print_meta: n_rot = 64
89
+ llm_load_print_meta: n_swa = 0
90
+ llm_load_print_meta: n_swa_pattern = 1
91
+ llm_load_print_meta: n_embd_head_k = 256
92
+ llm_load_print_meta: n_embd_head_v = 256
93
+ llm_load_print_meta: n_gqa = 8
94
+ llm_load_print_meta: n_embd_k_gqa = 512
95
+ llm_load_print_meta: n_embd_v_gqa = 512
96
+ llm_load_print_meta: f_norm_eps = 0.0e+00
97
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
98
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
99
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
100
+ llm_load_print_meta: f_logit_scale = 0.0e+00
101
+ llm_load_print_meta: n_ff = 5120
102
+ llm_load_print_meta: n_expert = 512
103
+ llm_load_print_meta: n_expert_used = 10
104
+ llm_load_print_meta: causal attn = 1
105
+ llm_load_print_meta: pooling type = 0
106
+ llm_load_print_meta: rope type = 2
107
+ llm_load_print_meta: rope scaling = linear
108
+ llm_load_print_meta: freq_base_train = 5000000.0
109
+ llm_load_print_meta: freq_scale_train = 1
110
+ llm_load_print_meta: n_ctx_orig_yarn = 262144
111
+ llm_load_print_meta: rope_finetuned = unknown
112
+ llm_load_print_meta: ssm_d_conv = 4
113
+ llm_load_print_meta: ssm_d_inner = 4096
114
+ llm_load_print_meta: ssm_d_state = 128
115
+ llm_load_print_meta: ssm_dt_rank = 32
116
+ llm_load_print_meta: model type = 80B.A3B
117
+ llm_load_print_meta: model ftype = Q8_0
118
+ llm_load_print_meta: model params = 79.674 B
119
+ llm_load_print_meta: model size = 78.982 GiB (8.515 BPW)
120
+ llm_load_print_meta: repeating layers = 78.366 GiB (8.515 BPW, 79.052 B parameters)
121
+ llm_load_print_meta: general.name = Qwen3 Coder Next
122
+ print_info: vocab type = BPE
123
+ print_info: n_vocab = 151936
124
+ print_info: n_merges = 151387
125
+ print_info: BOS token = 151643 '<|endoftext|>'
126
+ print_info: EOS token = 151645 '<|im_end|>'
127
+ print_info: EOT token = 151645 '<|im_end|>'
128
+ print_info: PAD token = 151643 '<|endoftext|>'
129
+ print_info: LF token = 198 'Ċ'
130
+ print_info: FIM PRE token = 151659 '<|fim_prefix|>'
131
+ print_info: FIM SUF token = 151661 '<|fim_suffix|>'
132
+ print_info: FIM MID token = 151660 '<|fim_middle|>'
133
+ print_info: FIM PAD token = 151662 '<|fim_pad|>'
134
+ print_info: FIM REP token = 151663 '<|repo_name|>'
135
+ print_info: FIM SEP token = 151664 '<|file_sep|>'
136
+ print_info: EOG token = 151643 '<|endoftext|>'
137
+ print_info: EOG token = 151645 '<|im_end|>'
138
+ print_info: EOG token = 151662 '<|fim_pad|>'
139
+ print_info: EOG token = 151663 '<|repo_name|>'
140
+ print_info: EOG token = 151664 '<|file_sep|>'
141
+ print_info: max token length = 256
142
+ llm_load_tensors: ggml ctx size = 0.35 MiB
143
+ llm_load_tensors: offloading 0 repeating layers to GPU
144
+ llm_load_tensors: offloaded 0/49 layers to GPU
145
+ llm_load_tensors: CPU buffer size = 80877.56 MiB
146
+ ....................................................................................................
147
+ llama_init_from_model: n_ctx = 2048
148
+ llama_init_from_model: n_batch = 2048
149
+ llama_init_from_model: n_ubatch = 512
150
+ llama_init_from_model: flash_attn = 1
151
+ llama_init_from_model: attn_max_b = 0
152
+ llama_init_from_model: fused_moe = 1
153
+ llama_init_from_model: grouped er = 0
154
+ llama_init_from_model: fused_up_gate = 1
155
+ llama_init_from_model: fused_mmad = 1
156
+ llama_init_from_model: rope_cache = 0
157
+ llama_init_from_model: graph_reuse = 1
158
+ llama_init_from_model: k_cache_hadam = 0
159
+ llama_init_from_model: split_mode_graph_scheduling = 0
160
+ llama_init_from_model: reduce_type = f16
161
+ llama_init_from_model: sched_async = 0
162
+ llama_init_from_model: ser = -1, 0
163
+ llama_init_from_model: freq_base = 5000000.0
164
+ llama_init_from_model: freq_scale = 1
165
+ llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
166
+ llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
167
+ llama_init_from_model: CPU output buffer size = 2.32 MiB
168
+ llama_init_from_model: CPU compute buffer size = 300.75 MiB
169
+ llama_init_from_model: graph nodes = 12382
170
+ llama_init_from_model: graph splits = 1
171
+ llama_init_from_model: enabling only_active_experts scheduling
172
+
173
+ system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
174
+ perplexity: tokenizing the input ..
175
+ perplexity: tokenization took 393.016 ms
176
+ perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
177
+ perplexity: 3.89 seconds per pass - ETA 9.45 minutes
178
+ ===================================== llama_init_from_model: f16
179
+ ======================================= HAVE_FANCY_SIMD is defined
180
+ [1]4.5875,[2]6.5391,[3]5.5418,[4]4.7813,[5]4.7239,[6]4.8641,[7]4.9464,[8]4.9799,[9]4.8794,[10]4.9186,[11]4.8311,[12]5.0638,[13]5.4758,[14]5.4421,[15]5.5581,[16]5.9216,[17]5.7716,[18]5.9584,[19]6.0471,[20]6.0826,[21]6.0039,[22]6.1138,[23]5.9257,[24]5.6949,[25]5.6193,[26]5.5165,[27]5.4492,[28]5.4022,[29]5.4725,[30]5.4682,[31]5.4576,[32]5.5297,[33]5.4780,[34]5.5340,[35]5.6289,[36]5.7241,[37]5.8537,[38]5.9517,[39]5.9865,[40]6.0831,[41]6.1320,[42]6.1626,[43]6.2308,[44]6.2354,[45]6.2713,[46]6.3169,[47]6.4885,[48]6.6022,[49]6.6025,[50]6.6506,[51]6.6955,[52]6.7621,[53]6.8298,[54]6.8792,[55]6.8985,[56]6.9732,[57]6.9892,[58]7.0319,[59]7.0810,[60]7.1243,[61]7.1677,[62]7.2040,[63]7.2667,[64]7.3282,[65]7.3910,[66]7.4599,[67]7.5227,[68]7.5168,[69]7.5369,[70]7.5450,[71]7.5845,[72]7.6558,[73]7.7078,[74]7.7400,[75]7.7186,[76]7.7322,[77]7.7932,[78]7.8266,[79]7.7395,[80]7.7131,[81]7.7061,[82]7.7441,[83]7.7189,[84]7.7117,[85]7.7374,[86]7.8199,[87]7.8592,[88]7.8776,[89]7.8941,[90]7.8811,[91]7.9399,[92]7.9209,[93]7.9644,[94]7.9827,[95]7.9716,[96]7.9631,[97]7.9575,[98]7.9951,[99]7.9748,[100]8.0542,[101]8.1007,[102]8.0984,[103]8.1046,[104]8.0933,[105]8.0955,[106]8.0986,[107]8.1340,[108]8.1685,[109]8.2070,[110]8.2639,[111]8.3726,[112]8.3847,[113]8.3486,[114]8.4009,[115]8.4281,[116]8.3756,[117]8.3775,[118]8.3687,[119]8.3335,[120]8.3531,[121]8.3422,[122]8.3302,[123]8.2914,[124]8.2508,[125]8.2295,[126]8.2170,[127]8.1696,[128]8.1543,[129]8.1240,[130]8.0819,[131]8.0492,[132]8.0243,[133]8.0171,[134]8.0301,[135]8.0236,[136]8.0272,[137]7.9992,[138]7.9701,[139]7.9854,[140]7.9697,[141]7.9669,[142]7.9874,[143]7.9921,[144]8.0284,[145]8.0062,[146]7.9704,[147]7.9344,[148]7.8958,[149]7.8752,[150]7.8334,[151]7.8219,[152]7.8126,[153]7.8092,[154]7.7721,[155]7.7746,[156]7.7377,[157]7.7190,[158]7.6944,[159]7.6786,[160]7.6406,[161]7.6243,[162]7.6177,[163]7.5988,[164]7.6113,[165]7.6010,[166]7.5931,[167]7.5936,[168]7.6145,[169]7.6185,[170]7.6464,[171]7.6500,[172]7.6718,[173]7.7234,[174]7.7359,[175]7.7903,[176]7.8151,[177]7.8677,[178]7.9072,[179]7.9127,[180]7.8864,[181]7.8569,[182]7.8669,[183]7.8351,[184]7.8213,[185]7.7934,[186]7.7633,[187]7.7423,[188]7.7376,[189]7.7526,[190]7.7795,[191]7.7907,[192]7.8004,[193]7.8020,[194]7.8192,[195]7.8349,[196]7.8423,[197]7.8494,[198]7.8328,[199]7.8242,[200]7.8113,[201]7.8119,[202]7.8288,[203]7.8531,[204]7.8743,[205]7.8922,[206]7.8944,[207]7.9204,[208]7.9081,[209]7.9079,[210]7.9063,[211]7.9100,[212]7.9139,[213]7.9127,[214]7.9003,[215]7.8869,[216]7.8808,[217]7.8881,[218]7.8830,[219]7.8656,[220]7.8363,[221]7.8221,[222]7.8089,[223]7.8070,[224]7.8140,[225]7.7961,[226]7.7891,[227]7.7782,[228]7.7538,[229]7.7259,[230]7.7102,[231]7.6916,[232]7.6795,[233]7.6742,[234]7.6720,[235]7.6704,[236]7.6567,[237]7.6480,[238]7.6346,[239]7.6289,[240]7.6378,[241]7.6485,[242]7.6605,[243]7.6586,[244]7.6725,[245]7.6758,[246]7.6980,[247]7.7085,[248]7.7137,[249]7.7237,[250]7.7270,[251]7.7456,[252]7.7635,[253]7.7988,[254]7.8217,[255]7.8256,[256]7.8426,[257]7.8581,[258]7.8453,[259]7.8316,[260]7.8173,[261]7.7956,[262]7.7840,[263]7.7788,[264]7.7764,[265]7.7850,[266]7.7912,[267]7.7904,[268]7.7816,[269]7.7872,[270]7.7839,[271]7.7786,[272]7.7765,[273]7.7742,[274]7.7705,[275]7.7655,[276]7.7512,[277]7.7516,[278]7.7500,[279]7.7421,[280]7.7375,[281]7.7329,[282]7.7307,[283]7.7072,[284]7.6786,[285]7.6887,[286]7.6725,[287]7.6578,[288]7.6568,[289]7.6535,[290]7.6756,[291]7.6809,[292]7.6808,[293]7.6829,[294]7.6995,[295]7.7105,[296]7.7202,[297]7.7424,[298]7.7402,[299]7.7307,[300]7.7315,[301]7.7251,[302]7.7273,[303]7.7223,[304]7.7467,[305]7.7518,[306]7.7497,[307]7.7536,[308]7.7547,[309]7.7544,[310]7.7602,[311]7.7630,[312]7.7532,[313]7.7490,[314]7.7557,[315]7.7434,[316]7.7452,[317]7.7606,[318]7.7680,[319]7.7611,[320]7.7641,[321]7.7534,[322]7.7638,[323]7.7729,[324]7.7806,[325]7.8009,[326]7.7988,[327]7.7870,[328]7.7917,[329]7.7791,[330]7.7707,[331]7.7643,[332]7.7649,[333]7.7685,[334]7.7650,[335]7.7566,[336]7.7590,[337]7.7650,[338]7.7781,[339]7.7748,[340]7.7709,[341]7.7633,[342]7.7633,[343]7.7624,[344]7.7673,[345]7.7760,[346]7.7717,[347]7.7597,[348]7.7629,[349]7.7586,[350]7.7497,[351]7.7482,[352]7.7532,[353]7.7523,[354]7.7426,[355]7.7545,[356]7.7631,[357]7.7686,[358]7.7614,[359]7.7656,[360]7.7660,[361]7.7759,[362]7.7672,[363]7.7609,[364]7.7680,[365]7.7866,[366]7.8134,[367]7.8287,[368]7.8593,[369]7.8742,[370]7.8896,[371]7.9130,[372]7.9338,[373]7.9449,[374]7.9543,[375]7.9728,[376]7.9863,[377]7.9976,[378]8.0101,[379]8.0213,[380]8.0378,[381]8.0547,[382]8.0647,[383]8.0727,[384]8.0838,[385]8.1097,[386]8.1306,[387]8.1295,[388]8.1297,[389]8.1393,[390]8.1631,[391]8.1808,[392]8.1749,[393]8.1734,[394]8.1663,[395]8.1673,[396]8.1756,[397]8.1846,[398]8.1902,[399]8.1980,[400]8.2104,[401]8.2112,[402]8.2106,[403]8.2021,[404]8.1795,[405]8.1670,[406]8.1673,[407]8.1752,[408]8.1851,[409]8.1873,[410]8.1961,[411]8.2133,[412]8.2197,[413]8.2181,[414]8.2159,[415]8.2107,[416]8.2037,[417]8.2099,[418]8.2194,[419]8.2237,[420]8.2255,[421]8.2326,[422]8.2217,[423]8.2213,[424]8.2240,[425]8.2268,[426]8.2283,[427]8.2356,[428]8.2494,[429]8.2573,[430]8.2530,[431]8.2494,[432]8.2536,[433]8.2568,[434]8.2587,[435]8.2682,[436]8.2620,[437]8.2673,[438]8.2690,[439]8.2641,[440]8.2684,[441]8.2672,[442]8.2645,[443]8.2568,[444]8.2591,[445]8.2505,[446]8.2530,[447]8.2469,[448]8.2416,[449]8.2355,[450]8.2415,[451]8.2405,[452]8.2285,[453]8.2199,[454]8.2177,[455]8.2240,[456]8.2229,[457]8.2282,[458]8.2418,[459]8.2390,[460]8.2393,[461]8.2374,[462]8.2352,[463]8.2458,[464]8.2449,[465]8.2458,[466]8.2483,[467]8.2540,[468]8.2595,[469]8.2643,[470]8.2695,[471]8.2593,[472]8.2674,[473]8.2575,[474]8.2567,[475]8.2620,[476]8.2605,[477]8.2506,[478]8.2361,[479]8.2396,[480]8.2475,[481]8.2514,[482]8.2405,[483]8.2480,[484]8.2555,[485]8.2596,[486]8.2591,[487]8.2645,[488]8.2599,[489]8.2489,[490]8.2482,[491]8.2419,[492]8.2427,[493]8.2344,[494]8.2327,[495]8.2281,[496]8.2245,[497]8.2362,[498]8.2428,[499]8.2357,[500]8.2358,[501]8.2366,[502]8.2351,[503]8.2484,[504]8.2517,[505]8.2556,[506]8.2535,[507]8.2504,[508]8.2543,[509]8.2519,[510]8.2510,[511]8.2543,[512]8.2496,[513]8.2516,[514]8.2549,[515]8.2549,[516]8.2573,[517]8.2604,[518]8.2537,[519]8.2540,[520]8.2569,[521]8.2595,[522]8.2499,[523]8.2494,[524]8.2466,[525]8.2499,[526]8.2552,[527]8.2570,[528]8.2560,[529]8.2508,[530]8.2471,[531]8.2510,[532]8.2483,[533]8.2472,[534]8.2475,[535]8.2486,[536]8.2413,[537]8.2474,[538]8.2560,[539]8.2523,[540]8.2648,[541]8.2669,[542]8.2615,[543]8.2644,[544]8.2708,[545]8.2668,[546]8.2583,[547]8.2508,[548]8.2356,[549]8.2363,[550]8.2200,[551]8.2092,[552]8.1997,[553]8.1730,[554]8.1722,[555]8.1753,[556]8.1762,[557]8.1790,[558]8.1785,[559]8.1850,[560]8.1915,[561]8.2008,[562]8.2139,[563]8.2215,[564]8.2194,[565]8.2281,[566]8.2281,[567]8.2141,[568]8.2063,[569]8.2034,[570]8.2029,[571]8.2027,[572]8.2051,[573]8.2058,[574]8.2074,[575]8.2069,[576]8.2131,[577]8.2080,[578]8.2128,[579]8.2179,[580]8.2319,[581]8.2334,[582]8.2452,[583]8.2296,[584]8.2239,
181
+ llama_print_timings: load time = 17688.23 ms
182
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
183
+ llama_print_timings: prompt eval time = 476918.01 ms / 299008 tokens ( 1.60 ms per token, 626.96 tokens per second)
184
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
185
+ llama_print_timings: total time = 488325.61 ms / 299009 tokens
186
+
187
+ Final estimate: PPL over 584 chunks for n_ctx=512 = 8.2239 +/- 0.06389
logs/perplexity-Qwen3-Coder-Next-smol-IQ2_KS.log ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ SOCKET is set to: 0
2
+ main: build = 4211 (b2cb4512)
3
+ main: built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
4
+ main: seed = 1337
5
+ CPU: using device CPU - 0 MiB free
6
+ llama_model_loader: loaded meta data with 47 key-value pairs and 843 tensors from /mnt/data/models/ubergarm/Qwen3-Coder-Next-GGUF/Qwen3-Coder-Next-smol-IQ2_KS.gguf (version GGUF V3 (latest))
7
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
8
+ llama_model_loader: - kv 0: general.architecture str = qwen3next
9
+ llama_model_loader: - kv 1: general.type str = model
10
+ llama_model_loader: - kv 2: general.sampling.top_k i32 = 40
11
+ llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000
12
+ llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000
13
+ llama_model_loader: - kv 5: general.name str = Qwen3 Coder Next
14
+ llama_model_loader: - kv 6: general.size_label str = 512x2.5B
15
+ llama_model_loader: - kv 7: general.license str = apache-2.0
16
+ llama_model_loader: - kv 8: general.license.link str = https://huggingface.co/Qwen/Qwen3-Cod...
17
+ llama_model_loader: - kv 9: general.tags arr[str,1] = ["text-generation"]
18
+ llama_model_loader: - kv 10: qwen3next.block_count u32 = 48
19
+ llama_model_loader: - kv 11: qwen3next.context_length u32 = 262144
20
+ llama_model_loader: - kv 12: qwen3next.embedding_length u32 = 2048
21
+ llama_model_loader: - kv 13: qwen3next.feed_forward_length u32 = 5120
22
+ llama_model_loader: - kv 14: qwen3next.attention.head_count u32 = 16
23
+ llama_model_loader: - kv 15: qwen3next.attention.head_count_kv u32 = 2
24
+ llama_model_loader: - kv 16: qwen3next.rope.freq_base f32 = 5000000.000000
25
+ llama_model_loader: - kv 17: qwen3next.attention.layer_norm_rms_epsilon f32 = 0.000001
26
+ llama_model_loader: - kv 18: qwen3next.expert_count u32 = 512
27
+ llama_model_loader: - kv 19: qwen3next.expert_used_count u32 = 10
28
+ llama_model_loader: - kv 20: qwen3next.attention.key_length u32 = 256
29
+ llama_model_loader: - kv 21: qwen3next.attention.value_length u32 = 256
30
+ llama_model_loader: - kv 22: general.file_type u32 = 147
31
+ llama_model_loader: - kv 23: qwen3next.expert_feed_forward_length u32 = 512
32
+ llama_model_loader: - kv 24: qwen3next.expert_shared_feed_forward_length u32 = 512
33
+ llama_model_loader: - kv 25: qwen3next.ssm.conv_kernel u32 = 4
34
+ llama_model_loader: - kv 26: qwen3next.ssm.state_size u32 = 128
35
+ llama_model_loader: - kv 27: qwen3next.ssm.group_count u32 = 16
36
+ llama_model_loader: - kv 28: qwen3next.ssm.time_step_rank u32 = 32
37
+ llama_model_loader: - kv 29: qwen3next.ssm.inner_size u32 = 4096
38
+ llama_model_loader: - kv 30: qwen3next.full_attention_interval u32 = 4
39
+ llama_model_loader: - kv 31: qwen3next.rope.dimension_count u32 = 64
40
+ llama_model_loader: - kv 32: general.quantization_version u32 = 2
41
+ llama_model_loader: - kv 33: tokenizer.ggml.model str = gpt2
42
+ llama_model_loader: - kv 34: tokenizer.ggml.pre str = qwen2
43
+ llama_model_loader: - kv 35: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "%", "&", "'", ...
44
+ llama_model_loader: - kv 36: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
45
+ llama_model_loader: - kv 37: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
46
+ llama_model_loader: - kv 38: tokenizer.ggml.eos_token_id u32 = 151645
47
+ llama_model_loader: - kv 39: tokenizer.ggml.padding_token_id u32 = 151643
48
+ llama_model_loader: - kv 40: tokenizer.ggml.bos_token_id u32 = 151643
49
+ llama_model_loader: - kv 41: tokenizer.ggml.add_bos_token bool = false
50
+ llama_model_loader: - kv 42: tokenizer.chat_template str = {% macro render_extra_keys(json_dict,...
51
+ llama_model_loader: - kv 43: quantize.imatrix.file str = /mnt/data/models/ubergarm/Qwen3-Coder...
52
+ llama_model_loader: - kv 44: quantize.imatrix.dataset str = ubergarm-imatrix-calibration-corpus-v...
53
+ llama_model_loader: - kv 45: quantize.imatrix.entries_count i32 = 577
54
+ llama_model_loader: - kv 46: quantize.imatrix.chunks_count i32 = 840
55
+ llama_model_loader: - type f32: 361 tensors
56
+ llama_model_loader: - type q8_0: 336 tensors
57
+ llama_model_loader: - type iq4_k: 1 tensors
58
+ llama_model_loader: - type iq6_k: 1 tensors
59
+ llama_model_loader: - type iq2_ks: 144 tensors
60
+ load: printing all EOG tokens:
61
+ load: - 151643 ('<|endoftext|>')
62
+ load: - 151645 ('<|im_end|>')
63
+ load: - 151662 ('<|fim_pad|>')
64
+ load: - 151663 ('<|repo_name|>')
65
+ load: - 151664 ('<|file_sep|>')
66
+ load: special tokens cache size = 26
67
+ load: token to piece cache size = 0.9311 MB
68
+ llm_load_print_meta: format = GGUF V3 (latest)
69
+ llm_load_print_meta: arch = qwen3next
70
+ llm_load_print_meta: n_ctx_train = 262144
71
+ llm_load_print_meta: n_embd = 2048
72
+ llm_load_print_meta: n_layer = 48
73
+ llm_load_print_meta: n_head = 16
74
+ llm_load_print_meta: n_head_kv = 2
75
+ llm_load_print_meta: n_rot = 64
76
+ llm_load_print_meta: n_swa = 0
77
+ llm_load_print_meta: n_swa_pattern = 1
78
+ llm_load_print_meta: n_embd_head_k = 256
79
+ llm_load_print_meta: n_embd_head_v = 256
80
+ llm_load_print_meta: n_gqa = 8
81
+ llm_load_print_meta: n_embd_k_gqa = 512
82
+ llm_load_print_meta: n_embd_v_gqa = 512
83
+ llm_load_print_meta: f_norm_eps = 0.0e+00
84
+ llm_load_print_meta: f_norm_rms_eps = 1.0e-06
85
+ llm_load_print_meta: f_clamp_kqv = 0.0e+00
86
+ llm_load_print_meta: f_max_alibi_bias = 0.0e+00
87
+ llm_load_print_meta: f_logit_scale = 0.0e+00
88
+ llm_load_print_meta: n_ff = 5120
89
+ llm_load_print_meta: n_expert = 512
90
+ llm_load_print_meta: n_expert_used = 10
91
+ llm_load_print_meta: causal attn = 1
92
+ llm_load_print_meta: pooling type = 0
93
+ llm_load_print_meta: rope type = 2
94
+ llm_load_print_meta: rope scaling = linear
95
+ llm_load_print_meta: freq_base_train = 5000000.0
96
+ llm_load_print_meta: freq_scale_train = 1
97
+ llm_load_print_meta: n_ctx_orig_yarn = 262144
98
+ llm_load_print_meta: rope_finetuned = unknown
99
+ llm_load_print_meta: ssm_d_conv = 4
100
+ llm_load_print_meta: ssm_d_inner = 4096
101
+ llm_load_print_meta: ssm_d_state = 128
102
+ llm_load_print_meta: ssm_dt_rank = 32
103
+ llm_load_print_meta: model type = 80B.A3B
104
+ llm_load_print_meta: model ftype = IQ2_KS - 2.1875 bpw
105
+ llm_load_print_meta: model params = 79.674 B
106
+ llm_load_print_meta: model size = 22.097 GiB (2.382 BPW)
107
+ llm_load_print_meta: repeating layers = 21.694 GiB (2.357 BPW, 79.052 B parameters)
108
+ llm_load_print_meta: general.name = Qwen3 Coder Next
109
+ print_info: vocab type = BPE
110
+ print_info: n_vocab = 151936
111
+ print_info: n_merges = 151387
112
+ print_info: BOS token = 151643 '<|endoftext|>'
113
+ print_info: EOS token = 151645 '<|im_end|>'
114
+ print_info: EOT token = 151645 '<|im_end|>'
115
+ print_info: PAD token = 151643 '<|endoftext|>'
116
+ print_info: LF token = 198 'Ċ'
117
+ print_info: FIM PRE token = 151659 '<|fim_prefix|>'
118
+ print_info: FIM SUF token = 151661 '<|fim_suffix|>'
119
+ print_info: FIM MID token = 151660 '<|fim_middle|>'
120
+ print_info: FIM PAD token = 151662 '<|fim_pad|>'
121
+ print_info: FIM REP token = 151663 '<|repo_name|>'
122
+ print_info: FIM SEP token = 151664 '<|file_sep|>'
123
+ print_info: EOG token = 151643 '<|endoftext|>'
124
+ print_info: EOG token = 151645 '<|im_end|>'
125
+ print_info: EOG token = 151662 '<|fim_pad|>'
126
+ print_info: EOG token = 151663 '<|repo_name|>'
127
+ print_info: EOG token = 151664 '<|file_sep|>'
128
+ print_info: max token length = 256
129
+ llm_load_tensors: ggml ctx size = 0.35 MiB
130
+ llm_load_tensors: offloading 0 repeating layers to GPU
131
+ llm_load_tensors: offloaded 0/49 layers to GPU
132
+ llm_load_tensors: CPU buffer size = 22627.63 MiB
133
+ ....................................................................................................
134
+ llama_init_from_model: n_ctx = 2048
135
+ llama_init_from_model: n_batch = 2048
136
+ llama_init_from_model: n_ubatch = 512
137
+ llama_init_from_model: flash_attn = 1
138
+ llama_init_from_model: attn_max_b = 0
139
+ llama_init_from_model: fused_moe = 1
140
+ llama_init_from_model: grouped er = 0
141
+ llama_init_from_model: fused_up_gate = 1
142
+ llama_init_from_model: fused_mmad = 1
143
+ llama_init_from_model: rope_cache = 0
144
+ llama_init_from_model: graph_reuse = 1
145
+ llama_init_from_model: k_cache_hadam = 0
146
+ llama_init_from_model: split_mode_graph_scheduling = 0
147
+ llama_init_from_model: reduce_type = f16
148
+ llama_init_from_model: sched_async = 0
149
+ llama_init_from_model: ser = -1, 0
150
+ llama_init_from_model: freq_base = 5000000.0
151
+ llama_init_from_model: freq_scale = 1
152
+ llama_kv_cache_init: CPU KV buffer size = 349.50 MiB
153
+ llama_init_from_model: KV self size = 48.00 MiB, K (f16): 24.00 MiB, V (f16): 24.00 MiB
154
+ llama_init_from_model: CPU output buffer size = 2.32 MiB
155
+ llama_init_from_model: CPU compute buffer size = 300.75 MiB
156
+ llama_init_from_model: graph nodes = 12382
157
+ llama_init_from_model: graph splits = 1
158
+ llama_init_from_model: enabling only_active_experts scheduling
159
+
160
+ system_info: n_threads = 96 (n_threads_batch = 128) / 512 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
161
+ perplexity: tokenizing the input ..
162
+ perplexity: tokenization took 392.737 ms
163
+ perplexity: calculating perplexity over 584 chunks, n_ctx=512, batch_size=2048, n_seq=4
164
+ perplexity: 3.33 seconds per pass - ETA 8.10 minutes
165
+ ===================================== llama_init_from_model: f16
166
+ ======================================= HAVE_FANCY_SIMD is defined
167
+ [1]4.8429,[2]7.8294,[3]7.0716,[4]6.5755,[5]6.9072,[6]7.1962,[7]7.3575,[8]7.5293,[9]7.6946,[10]8.0277,[11]8.0696,[12]8.2023,[13]8.7628,[14]8.6121,[15]8.5901,[16]9.0088,[17]8.5033,[18]8.6134,[19]8.6028,[20]8.6108,[21]8.4277,[22]8.4716,[23]8.1122,[24]7.7254,[25]7.5349,[26]7.3212,[27]7.1690,[28]7.0490,[29]7.1071,[30]7.0900,[31]7.0594,[32]7.0950,[33]7.0187,[34]7.0633,[35]7.1666,[36]7.2491,[37]7.3830,[38]7.4542,[39]7.4793,[40]7.5868,[41]7.6107,[42]7.6280,[43]7.6983,[44]7.7012,[45]7.7412,[46]7.7686,[47]7.9614,[48]8.0843,[49]8.0835,[50]8.1432,[51]8.1917,[52]8.2669,[53]8.3461,[54]8.4150,[55]8.4272,[56]8.5091,[57]8.5269,[58]8.5722,[59]8.6444,[60]8.6816,[61]8.7244,[62]8.7569,[63]8.8195,[64]8.8811,[65]8.9548,[66]9.0362,[67]9.1083,[68]9.0784,[69]9.1017,[70]9.1076,[71]9.1492,[72]9.2382,[73]9.2889,[74]9.3185,[75]9.2809,[76]9.2815,[77]9.3489,[78]9.3843,[79]9.3121,[80]9.2703,[81]9.2557,[82]9.3199,[83]9.2880,[84]9.2721,[85]9.3036,[86]9.3928,[87]9.4341,[88]9.4424,[89]9.4425,[90]9.4179,[91]9.4699,[92]9.4452,[93]9.4936,[94]9.5031,[95]9.4816,[96]9.4613,[97]9.4439,[98]9.4771,[99]9.4538,[100]9.5435,[101]9.5925,[102]9.5856,[103]9.5986,[104]9.5757,[105]9.5770,[106]9.5734,[107]9.6082,[108]9.6454,[109]9.6948,[110]9.7585,[111]9.8749,[112]9.8825,[113]9.8339,[114]9.8917,[115]9.9160,[116]9.8810,[117]9.8785,[118]9.8490,[119]9.8067,[120]9.8207,[121]9.8138,[122]9.8066,[123]9.7576,[124]9.7008,[125]9.6762,[126]9.6569,[127]9.6139,[128]9.5997,[129]9.5630,[130]9.5106,[131]9.4652,[132]9.4329,[133]9.4258,[134]9.4413,[135]9.4357,[136]9.4305,[137]9.3887,[138]9.3495,[139]9.3600,[140]9.3350,[141]9.3276,[142]9.3555,[143]9.3724,[144]9.4073,[145]9.3838,[146]9.3417,[147]9.2916,[148]9.2446,[149]9.2204,[150]9.1729,[151]9.1506,[152]9.1439,[153]9.1381,[154]9.0907,[155]9.1003,[156]9.0551,[157]9.0207,[158]8.9844,[159]8.9584,[160]8.9144,[161]8.8936,[162]8.8854,[163]8.8668,[164]8.8804,[165]8.8623,[166]8.8591,[167]8.8520,[168]8.8742,[169]8.8855,[170]8.9246,[171]8.9493,[172]8.9828,[173]9.0260,[174]9.0397,[175]9.0997,[176]9.1308,[177]9.1837,[178]9.2297,[179]9.2357,[180]9.2267,[181]9.2222,[182]9.2436,[183]9.2162,[184]9.2135,[185]9.2016,[186]9.1843,[187]9.1681,[188]9.1640,[189]9.1750,[190]9.2058,[191]9.2156,[192]9.2265,[193]9.2226,[194]9.2407,[195]9.2596,[196]9.2704,[197]9.2788,[198]9.2535,[199]9.2347,[200]9.2160,[201]9.2232,[202]9.2413,[203]9.2683,[204]9.2869,[205]9.3039,[206]9.3015,[207]9.3260,[208]9.3149,[209]9.3181,[210]9.3171,[211]9.3191,[212]9.3221,[213]9.3198,[214]9.3041,[215]9.2824,[216]9.2732,[217]9.2754,[218]9.2671,[219]9.2412,[220]9.2017,[221]9.1841,[222]9.1637,[223]9.1600,[224]9.1689,[225]9.1414,[226]9.1301,[227]9.1159,[228]9.0854,[229]9.0513,[230]9.0313,[231]9.0167,[232]9.0023,[233]9.0003,[234]8.9999,[235]8.9939,[236]8.9729,[237]8.9578,[238]8.9379,[239]8.9358,[240]8.9414,[241]8.9438,[242]8.9530,[243]8.9562,[244]8.9719,[245]8.9784,[246]9.0025,[247]9.0127,[248]9.0172,[249]9.0202,[250]9.0239,[251]9.0454,[252]9.0630,[253]9.0954,[254]9.1210,[255]9.1267,[256]9.1444,[257]9.1605,[258]9.1417,[259]9.1280,[260]9.1092,[261]9.0840,[262]9.0690,[263]9.0631,[264]9.0623,[265]9.0710,[266]9.0798,[267]9.0735,[268]9.0635,[269]9.0666,[270]9.0629,[271]9.0544,[272]9.0509,[273]9.0485,[274]9.0404,[275]9.0373,[276]9.0187,[277]9.0170,[278]9.0184,[279]9.0115,[280]9.0061,[281]8.9988,[282]8.9964,[283]8.9631,[284]8.9301,[285]8.9365,[286]8.9206,[287]8.8995,[288]8.8962,[289]8.8897,[290]8.9140,[291]8.9150,[292]8.9127,[293]8.9134,[294]8.9315,[295]8.9453,[296]8.9544,[297]8.9788,[298]8.9740,[299]8.9609,[300]8.9602,[301]8.9519,[302]8.9513,[303]8.9436,[304]8.9686,[305]8.9754,[306]8.9706,[307]8.9705,[308]8.9670,[309]8.9670,[310]8.9750,[311]8.9739,[312]8.9634,[313]8.9605,[314]8.9673,[315]8.9490,[316]8.9543,[317]8.9738,[318]8.9778,[319]8.9710,[320]8.9783,[321]8.9657,[322]8.9767,[323]8.9911,[324]9.0044,[325]9.0241,[326]9.0233,[327]9.0116,[328]9.0126,[329]8.9956,[330]8.9865,[331]8.9784,[332]8.9760,[333]8.9773,[334]8.9668,[335]8.9534,[336]8.9551,[337]8.9634,[338]8.9728,[339]8.9703,[340]8.9615,[341]8.9512,[342]8.9518,[343]8.9484,[344]8.9574,[345]8.9663,[346]8.9629,[347]8.9510,[348]8.9517,[349]8.9491,[350]8.9381,[351]8.9439,[352]8.9486,[353]8.9470,[354]8.9363,[355]8.9537,[356]8.9611,[357]8.9631,[358]8.9560,[359]8.9612,[360]8.9600,[361]8.9684,[362]8.9601,[363]8.9548,[364]8.9651,[365]8.9822,[366]9.0100,[367]9.0305,[368]9.0611,[369]9.0787,[370]9.0981,[371]9.1247,[372]9.1465,[373]9.1568,[374]9.1663,[375]9.1880,[376]9.2017,[377]9.2114,[378]9.2245,[379]9.2360,[380]9.2549,[381]9.2723,[382]9.2864,[383]9.2972,[384]9.3097,[385]9.3384,[386]9.3628,[387]9.3637,[388]9.3657,[389]9.3745,[390]9.3991,[391]9.4208,[392]9.4141,[393]9.4165,[394]9.4058,[395]9.4061,[396]9.4136,[397]9.4203,[398]9.4247,[399]9.4316,[400]9.4446,[401]9.4478,[402]9.4474,[403]9.4377,[404]9.4197,[405]9.4092,[406]9.4086,[407]9.4157,[408]9.4268,[409]9.4266,[410]9.4338,[411]9.4525,[412]9.4577,[413]9.4600,[414]9.4581,[415]9.4494,[416]9.4452,[417]9.4516,[418]9.4595,[419]9.4641,[420]9.4646,[421]9.4723,[422]9.4580,[423]9.4582,[424]9.4620,[425]9.4665,[426]9.4707,[427]9.4815,[428]9.4978,[429]9.5027,[430]9.4954,[431]9.4906,[432]9.4944,[433]9.4962,[434]9.4958,[435]9.5062,[436]9.4942,[437]9.5001,[438]9.5003,[439]9.4923,[440]9.4980,[441]9.4970,[442]9.4922,[443]9.4856,[444]9.4883,[445]9.4772,[446]9.4805,[447]9.4758,[448]9.4687,[449]9.4638,[450]9.4710,[451]9.4704,[452]9.4603,[453]9.4529,[454]9.4500,[455]9.4543,[456]9.4531,[457]9.4565,[458]9.4741,[459]9.4698,[460]9.4684,[461]9.4657,[462]9.4653,[463]9.4783,[464]9.4795,[465]9.4800,[466]9.4820,[467]9.4880,[468]9.4943,[469]9.4995,[470]9.5061,[471]9.4974,[472]9.5092,[473]9.5011,[474]9.4997,[475]9.5054,[476]9.5069,[477]9.4980,[478]9.4825,[479]9.4838,[480]9.4895,[481]9.4934,[482]9.4811,[483]9.4902,[484]9.4982,[485]9.5026,[486]9.5023,[487]9.5077,[488]9.5005,[489]9.4890,[490]9.4850,[491]9.4755,[492]9.4755,[493]9.4620,[494]9.4589,[495]9.4515,[496]9.4471,[497]9.4608,[498]9.4675,[499]9.4613,[500]9.4632,[501]9.4662,[502]9.4645,[503]9.4800,[504]9.4852,[505]9.4893,[506]9.4848,[507]9.4786,[508]9.4811,[509]9.4758,[510]9.4739,[511]9.4779,[512]9.4752,[513]9.4775,[514]9.4806,[515]9.4797,[516]9.4806,[517]9.4816,[518]9.4748,[519]9.4730,[520]9.4733,[521]9.4747,[522]9.4639,[523]9.4643,[524]9.4612,[525]9.4637,[526]9.4678,[527]9.4703,[528]9.4684,[529]9.4617,[530]9.4559,[531]9.4609,[532]9.4578,[533]9.4567,[534]9.4547,[535]9.4553,[536]9.4500,[537]9.4607,[538]9.4693,[539]9.4668,[540]9.4802,[541]9.4855,[542]9.4765,[543]9.4782,[544]9.4843,[545]9.4799,[546]9.4703,[547]9.4596,[548]9.4436,[549]9.4466,[550]9.4284,[551]9.4144,[552]9.4018,[553]9.3688,[554]9.3697,[555]9.3733,[556]9.3747,[557]9.3758,[558]9.3750,[559]9.3833,[560]9.3911,[561]9.3988,[562]9.4125,[563]9.4211,[564]9.4178,[565]9.4292,[566]9.4316,[567]9.4214,[568]9.4142,[569]9.4067,[570]9.4089,[571]9.4102,[572]9.4169,[573]9.4191,[574]9.4214,[575]9.4213,[576]9.4305,[577]9.4243,[578]9.4290,[579]9.4350,[580]9.4504,[581]9.4521,[582]9.4674,[583]9.4524,[584]9.4488,
168
+ llama_print_timings: load time = 4893.63 ms
169
+ llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
170
+ llama_print_timings: prompt eval time = 442686.42 ms / 299008 tokens ( 1.48 ms per token, 675.44 tokens per second)
171
+ llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
172
+ llama_print_timings: total time = 454271.01 ms / 299009 tokens
173
+
174
+ Final estimate: PPL over 584 chunks for n_ctx=512 = 9.4488 +/- 0.07565
logs/quantize-Qwen3-Coder-Next-IQ4_KSS.log ADDED
The diff for this file is too large to render. See raw diff
 
logs/quantize-Qwen3-Coder-Next-Q8_0.log ADDED
The diff for this file is too large to render. See raw diff
 
logs/quantize-Qwen3-Coder-Next-smol-IQ2_KS.log ADDED
The diff for this file is too large to render. See raw diff