turboderp commited on
Commit
1041bb6
·
verified ·
1 Parent(s): b5b63a7

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ olmo-instruct.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: allenai/Olmo-3.1-32B-Instruct-DPO
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ datasets:
8
+ - allenai/Dolci-Instruct-RL
9
+ ---
10
+
11
+ ## Model Details
12
+ <img alt="Logo for Olmo 3.1 32B Instruct model" src="olmo-instruct.png" width="307px" style="margin-left:'auto' margin-right:'auto' display:'block'">
13
+
14
+
15
+
16
+ # Model Card for Olmo-3.1-32B-Instruct
17
+
18
+ We introduce Olmo 3, a new family of 7B and 32B models both Instruct and Think variants. Long chain-of-thought thinking improves reasoning tasks like math and coding.
19
+
20
+ Olmo is a series of **O**pen **l**anguage **mo**dels designed to enable the science of language models.
21
+ These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets. We are releasing all code, checkpoints, logs (coming soon), and associated training details.
22
+
23
+
24
+
25
+ The core models released in this batch include the following:
26
+
27
+ | **Stage** | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
28
+ |--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
29
+ | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
30
+ | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
31
+ | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
32
+ | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
33
+
34
+
35
+ ## Installation
36
+
37
+ Olmo 3 is supported in transformers 4.57.0 or higher:
38
+ ```bash
39
+ pip install transformers>=4.57.0
40
+ ```
41
+
42
+ ## Inference
43
+
44
+ You can use OLMo with the standard HuggingFace transformers library:
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
48
+ tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3.1-32B-Instruct")
49
+ message = ["Language modeling is "]
50
+ inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
51
+ # optional verifying cuda
52
+ # inputs = {k: v.to('cuda') for k,v in inputs.items()}
53
+ # olmo = olmo.to('cuda')
54
+ response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
55
+ print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
56
+ >> 'Language modeling is a key component of any text-based application, but its effectiveness...'
57
+ ```
58
+
59
+ For faster performance, you can quantize the model using the following method:
60
+ ```python
61
+ AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct",
62
+ torch_dtype=torch.float16,
63
+ load_in_8bit=True) # Requires bitsandbytes
64
+ ```
65
+ The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
66
+ ```python
67
+ inputs.input_ids.to('cuda')
68
+ ```
69
+
70
+ We have released checkpoints for these models. For post-training, the naming convention is `step_XXXX`.
71
+ **NOTE**: For this model, due to a checkpointing issue, we only are releasing the final few checkpoints. See our other RL jobs for more detailed intermediate checkpoint suite.
72
+
73
+ To load a specific model revision with HuggingFace, simply add the argument `revision`:
74
+ ```bash
75
+ olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3.1-32B-Instruct", revision="step_1375")
76
+ ```
77
+
78
+ Or, you can access all the revisions for the models via the following code snippet:
79
+ ```python
80
+ from huggingface_hub import list_repo_refs
81
+ out = list_repo_refs("allenai/Olmo-3.1-32B-Instruct")
82
+ branches = [b.name for b in out.branches]
83
+ ```
84
+
85
+ ### Fine-tuning
86
+ Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
87
+ 1. Fine-tune with the OLMo-core repository:
88
+ ```bash
89
+ torchrun --nproc-per-node=8 ./src/scripts/official/MODEL.py run01
90
+ ```
91
+ You can override most configuration options from the command-line. For example, to override the learning rate you could launch the script like this:
92
+
93
+ ```bash
94
+ torchrun --nproc-per-node=8 ./src/scripts/train/MODEL.py run01 --train_module.optim.lr=6e-3
95
+ ```
96
+ For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-core).
97
+
98
+ ### Model Description
99
+
100
+ - **Developed by:** Allen Institute for AI (Ai2)
101
+ - **Model type:** a Transformer style autoregressive language model.
102
+ - **Language(s) (NLP):** English
103
+ - **License:** This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with Ai2's [Responsible Use Guidelines](https://allenai.org/responsible-use).
104
+ - **Contact:** Technical inquiries: `olmo@allenai.org`. Press: `press@allenai.org`
105
+ - **Date cutoff:** Dec. 2024.
106
+
107
+
108
+ ### Model Sources
109
+
110
+ - **Project Page:** https://allenai.org/olmo
111
+ - **Repositories:**
112
+ - Open-Instruct for DPO and RLVR: https://github.com/allenai/open-instruct
113
+ - OLMo-Core for pre-training and SFT: https://github.com/allenai/OLMo-core
114
+ - OLMo-Eval for evaluation: https://github.com/allenai/OLMo-Eval
115
+ - **Paper:**: https://allenai.org/papers/olmo3
116
+
117
+
118
+ ## Evaluation
119
+
120
+ | Metric | **Olmo 3.1 32B Instruct SFT** | **Olmo 3.1 32B Instruct DPO** | **Olmo 3.1 32B Instruct** | Apertus 70B | Qwen 3 32B (No Think) | Qwen 3 VL 32B Instruct | Qwen 2.5 32B | Gemma 3 27B | Gemma 2 27B | OLMo 2 32B |
121
+ | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- |
122
+ | **Math** | | | | | | | | | | |
123
+ | MATH | 74.4 | 86.6 | 93.4 | 36.2 | 84.3 | 95.1 | 80.2 | 87.4 | 51.5 | 49.2 |
124
+ | AIME 2024 | 12.7 | 35.2 | 67.8 | 0.31 | 27.9 | 75.4 | 15.7 | 28.9 | 4.7 | 4.6 |
125
+ | AIME 2025 | 8.2 | 23.3 | 57.9 | 0.1 | 21.3 | 64.2 | 13.4 | 22.9 | 0.9 | 0.9 |
126
+ | OMEGA | 15.5 | 33.3 | 42.2 | 5.6 | 23.4 | 44.0 | 19.2 | 24.0 | 9.1 | 9.8 |
127
+ | **Reasoning** | | | | | | | | | | |
128
+ | BigBenchHard | 69.0 | 82.1 | 84.0 | 57.0 | 80.4 | 89.0 | 80.9 | 82.4 | 66.0 | 65.6 |
129
+ | ZebraLogic | 30.6 | 51.1 | 61.7 | 9.0 | 28.4 | 86.7 | 24.1 | 24.8 | 17.2 | 13.3 |
130
+ | AGI Eval English | 71.7 | 79.4 | 79.5 | 61.6 | 82.4 | 89.4 | 78.9 | 76.9 | 70.9 | 68.4 |
131
+ | **Coding** | | | | | | | | | | |
132
+ | HumanEvalPlus | 80.8 | 85.7 | 86.7 | 42.9 | 83.9 | 89.3 | 82.6 | 79.2 | 67.5 | 44.4 |
133
+ | MBPP+ | 61.5 | 63.6 | 65.1 | 45.8 | 67.9 | 69.0 | 66.6 | 65.7 | 61.2 | 49.0 |
134
+ | LiveCodeBench v3 | 35.4 | 49.6 | 54.7 | 9.7 | 57.5 | 70.2 | 49.9 | 39.0 | 28.7 | 10.6 |
135
+ | **IF** | | | | | | | | | | |
136
+ | IFEval | 87.7 | 87.3 | 88.8 | 70.4 | 87.5 | 88.1 | 81.9 | 85.4 | 62.1 | 85.8 |
137
+ | IFBench | 29.7 | 36.3 | 39.7 | 26.0 | 31.3 | 37.2 | 36.7 | 31.3 | 27.8 | 36.4 |
138
+ | **Knowledge & QA** | | | | | | | | | | |
139
+ | MMLU | 79.0 | 81.9 | 80.9 | 70.2 | 85.8 | 88.7 | 84.6 | 74.6 | 76.1 | 77.1 |
140
+ | PopQA | 23.7 | 28.5 | 25.0 | 33.5 | 25.9 | 25.7 | 28.0 | 30.2 | 30.4 | 37.2 |
141
+ | GPQA | 41.3 | 47.9 | 48.6 | 27.9 | 54.4 | 61.4 | 44.6 | 45.0 | 39.9 | 36.4 |
142
+ | **Chat** | | | | | | | | | | |
143
+ | AlpacaEval 2 LC | 42.2 | 69.7 | 59.8 | 19.9 | 67.9 | 84.3 | 81.9 | 65.5 | 39.8 | 38.0 |
144
+ | **Safety** | 92.1 | 88.9 | 89.5 | 77.1 | 81.6 | 85.8 | 82.2 | 68.8 | 74.4 | 84.2 |
145
+
146
+
147
+ ## Model Details
148
+
149
+ #### Stage 1: SFT
150
+ - supervised fine-tuning on the Dolci-Think-SFT-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
151
+ - Datasets: [Dolci-Think-SFT-7B](https://huggingface.co/datasets/allenai/dolci-thinking-sft), [Dolci-Instruct-SFT](https://huggingface.co/datasets/allenai/dolci-instruct-sft)
152
+
153
+ #### Stage 2:DPO
154
+ - direct preference optimization on the Dolci-Think-DPO-7B dataset. This dataset consits of math, code, chat, and general knowledge queries.
155
+ - Datasets: [Dolci-Think-DPO-7B](https://huggingface.co/datasets/allenai/dolci-thinking-dpo), [Dolci-Instruct-DPO](https://huggingface.co/datasets/allenai/dolci-3-instruct-dpo-with-metadata)
156
+
157
+ #### Stage 3: RLVR
158
+ - reinforcement learning from verifiable rewards on the Dolci-Think-RL-7B dataset. This dataset consits of math, code, instruction-following, and general chat queries.
159
+ - Datasets: [Dolci-Think-RL-7B](https://huggingface.co/datasets/allenai/Dolci-Think-RL-7B), [Dolci-Instruct-RL](https://huggingface.co/datasets/allenai/Dolci-Instruct-RL-7B)
160
+
161
+
162
+ ## Bias, Risks, and Limitations
163
+ Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
164
+
165
+ ## License
166
+ This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
167
+
168
+
169
+ ## Citation
170
+ A technical manuscript is forthcoming!
171
+
172
+ ## Model Card Contact
173
+ For errors in this model card, contact `olmo@allenai.org`.
chat_template.jinja ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 -%}{%- if not has_system -%}{{- '<|im_start|>system
2
+ You are Olmo, a helpful AI assistant built by Ai2. Your date cutoff is December 2024, and your model weights are available at https://huggingface.co/allenai. ' -}}{%- if tools is none or (tools | length) == 0 -%}{{- 'You do not currently have access to any functions. <functions></functions><|im_end|>
3
+ ' -}}{%- else -%}{{- 'You are provided with function signatures within <functions></functions> XML tags. You may call one or more functions to assist with the user query. Output any function calls within <function_calls></function_calls> XML tags. Do not make assumptions about what values to plug into functions.' -}}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions><|im_end|>
4
+ ' -}}{%- endif -%}{%- endif -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{{- '<|im_start|>system
5
+ ' + message['content'] -}}{%- if tools is not none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- elif message.get('functions', none) is not none -%}{{- ' <functions>' + message['functions'] + '</functions>' -}}{%- endif -%}{{- '<|im_end|>
6
+ ' -}}{%- elif message['role'] == 'user' -%}{{- '<|im_start|>user
7
+ ' + message['content'] + '<|im_end|>
8
+ ' -}}{%- elif message['role'] == 'assistant' -%}{{- '<|im_start|>assistant
9
+ ' -}}{%- if message.get('content', none) is not none -%}{{- message['content'] -}}{%- endif -%}{%- if message.get('function_calls', none) is not none -%}{{- '<function_calls>' + message['function_calls'] + '</function_calls>' -}}{% elif message.get('tool_calls', none) is not none %}{{- '<function_calls>' -}}{%- for tool_call in message['tool_calls'] %}{%- if tool_call is mapping and tool_call.get('function', none) is not none %}{%- set args = tool_call['function']['arguments'] -%}{%- set ns = namespace(arguments_list=[]) -%}{%- for key, value in args.items() -%}{%- set ns.arguments_list = ns.arguments_list + [key ~ '=' ~ (value | tojson)] -%}{%- endfor -%}{%- set arguments = ns.arguments_list | join(', ') -%}{{- tool_call['function']['name'] + '(' + arguments + ')' -}}{%- if not loop.last -%}{{ '
10
+ ' }}{%- endif -%}{% else %}{{- tool_call -}}{%- endif %}{%- endfor %}{{- '</function_calls>' -}}{%- endif -%}{%- if not loop.last -%}{{- '<|im_end|>' + '
11
+ ' -}}{%- else -%}{{- eos_token -}}{%- endif -%}{%- elif message['role'] == 'environment' -%}{{- '<|im_start|>environment
12
+ ' + message['content'] + '<|im_end|>
13
+ ' -}}{%- elif message['role'] == 'tool' -%}{{- '<|im_start|>environment
14
+ ' + message['content'] + '<|im_end|>
15
+ ' -}}{%- endif -%}{%- if loop.last and add_generation_prompt -%}{{- '<|im_start|>assistant\n' -}}{%- endif -%}{%- endfor -%}
config.json ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Olmo3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 100257,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 5120,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 27648,
13
+ "layer_types": [
14
+ "sliding_attention",
15
+ "sliding_attention",
16
+ "sliding_attention",
17
+ "full_attention",
18
+ "sliding_attention",
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "full_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "sliding_attention",
25
+ "full_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "full_attention",
30
+ "sliding_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "full_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "sliding_attention",
37
+ "full_attention",
38
+ "sliding_attention",
39
+ "sliding_attention",
40
+ "sliding_attention",
41
+ "full_attention",
42
+ "sliding_attention",
43
+ "sliding_attention",
44
+ "sliding_attention",
45
+ "full_attention",
46
+ "sliding_attention",
47
+ "sliding_attention",
48
+ "sliding_attention",
49
+ "full_attention",
50
+ "sliding_attention",
51
+ "sliding_attention",
52
+ "sliding_attention",
53
+ "full_attention",
54
+ "sliding_attention",
55
+ "sliding_attention",
56
+ "sliding_attention",
57
+ "full_attention",
58
+ "sliding_attention",
59
+ "sliding_attention",
60
+ "sliding_attention",
61
+ "full_attention",
62
+ "sliding_attention",
63
+ "sliding_attention",
64
+ "sliding_attention",
65
+ "full_attention",
66
+ "sliding_attention",
67
+ "sliding_attention",
68
+ "sliding_attention",
69
+ "full_attention",
70
+ "sliding_attention",
71
+ "sliding_attention",
72
+ "sliding_attention",
73
+ "full_attention",
74
+ "sliding_attention",
75
+ "sliding_attention",
76
+ "sliding_attention",
77
+ "full_attention"
78
+ ],
79
+ "max_position_embeddings": 65536,
80
+ "model_type": "olmo3",
81
+ "num_attention_heads": 40,
82
+ "num_hidden_layers": 64,
83
+ "num_key_value_heads": 8,
84
+ "pad_token_id": 100277,
85
+ "rms_norm_eps": 1e-06,
86
+ "rope_scaling": {
87
+ "attention_factor": 1.2079441541679836,
88
+ "beta_fast": 32,
89
+ "beta_slow": 1,
90
+ "factor": 8.0,
91
+ "original_max_position_embeddings": 8192,
92
+ "rope_type": "yarn"
93
+ },
94
+ "rope_theta": 500000,
95
+ "sliding_window": 4096,
96
+ "tie_word_embeddings": false,
97
+ "transformers_version": "4.57.1",
98
+ "use_cache": false,
99
+ "vocab_size": 100278,
100
+ "quantization_config": {
101
+ "quant_method": "exl3",
102
+ "version": "0.0.18",
103
+ "bits": 4.0,
104
+ "head_bits": 6,
105
+ "calibration": {
106
+ "rows": 250,
107
+ "cols": 2048
108
+ },
109
+ "out_scales": "auto",
110
+ "codebook": "mcg"
111
+ }
112
+ }
generation_config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "do_sample": true,
4
+ "eos_token_id": 100257,
5
+ "pad_token_id": 100277,
6
+ "transformers_version": "4.57.1",
7
+ "temperature": 0.6,
8
+ "top_p": 0.95,
9
+ "max_new_tokens": 32768
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9048387927d107030b1712014eb2ad1b3ac63b5e9021af1416deaa162cc94fd
3
+ size 8349614920
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1c901d8e5524dd369b66947f38a1d55d9c2f8ebd9dec1968ad567f6871ec273
3
+ size 8299144024
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f0d83c95bcfe77d68ac3f8c53b853aad0723fb2ba1bb269356473db7e36a61d
3
+ size 385562940
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
olmo-instruct.png ADDED

Git LFS Details

  • SHA256: 9ae64ca9d1e2a551eb275120ec9d4dd8a3276eb10035ff118e566750d80b0435
  • Pointer size: 131 Bytes
  • Size of remote file: 105 kB
quantization_config.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<|pad|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<|endoftext|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|extra_id_0|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": false
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "|||PHONE_NUMBER|||",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "100262": {
53
+ "content": "|||EMAIL_ADDRESS|||",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "100263": {
61
+ "content": "|||IP_ADDRESS|||",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "100264": {
69
+ "content": "<|im_start|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|im_end|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<functions>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "100267": {
93
+ "content": "</functions>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "100268": {
101
+ "content": "<function_calls>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "100269": {
109
+ "content": "</function_calls>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "100270": {
117
+ "content": "<|extra_id_1|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "100271": {
125
+ "content": "<|extra_id_2|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "100272": {
133
+ "content": "<|extra_id_3|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "100273": {
141
+ "content": "<|extra_id_4|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "100274": {
149
+ "content": "<|extra_id_5|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "100275": {
157
+ "content": "<|extra_id_6|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "100277": {
173
+ "content": "<|pad|>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ }
180
+ },
181
+ "bos_token": "<|endoftext|>",
182
+ "clean_up_tokenization_spaces": false,
183
+ "eos_token": "<|endoftext|>",
184
+ "extra_special_tokens": {},
185
+ "model_max_length": 65536,
186
+ "pad_token": "<|pad|>",
187
+ "tokenizer_class": "GPT2Tokenizer",
188
+ "unk_token": "<|endoftext|>",
189
+ "chat_template": "{%- set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 -%}{%- if not has_system -%}{{- '<|im_start|>system\nYou are Olmo, a helpful AI assistant built by Ai2. Your date cutoff is December 2024, and your model weights are available at https://huggingface.co/allenai. ' -}}{%- if tools is none or (tools | length) == 0 -%}{{- 'You do not currently have access to any functions. <functions></functions><|im_end|>\n' -}}{%- else -%}{{- 'You are provided with function signatures within <functions></functions> XML tags. You may call one or more functions to assist with the user query. Output any function calls within <function_calls></function_calls> XML tags. Do not make assumptions about what values to plug into functions.' -}}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions><|im_end|>\n' -}}{%- endif -%}{%- endif -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{{- '<|im_start|>system\n' + message['content'] -}}{%- if tools is not none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- elif message.get('functions', none) is not none -%}{{- ' <functions>' + message['functions'] + '</functions>' -}}{%- endif -%}{{- '<|im_end|>\n' -}}{%- elif message['role'] == 'user' -%}{{- '<|im_start|>user\n' + message['content'] + '<|im_end|>\n' -}}{%- elif message['role'] == 'assistant' -%}{{- '<|im_start|>assistant\n' -}}{%- if message.get('content', none) is not none -%}{{- message['content'] -}}{%- endif -%}{%- if message.get('function_calls', none) is not none -%}{{- '<function_calls>' + message['function_calls'] + '</function_calls>' -}}{% elif message.get('tool_calls', none) is not none %}{{- '<function_calls>' -}}{%- for tool_call in message['tool_calls'] %}{%- if tool_call is mapping and tool_call.get('function', none) is not none %}{%- set args = tool_call['function']['arguments'] -%}{%- set ns = namespace(arguments_list=[]) -%}{%- for key, value in args.items() -%}{%- set ns.arguments_list = ns.arguments_list + [key ~ '=' ~ (value | tojson)] -%}{%- endfor -%}{%- set arguments = ns.arguments_list | join(', ') -%}{{- tool_call['function']['name'] + '(' + arguments + ')' -}}{%- if not loop.last -%}{{ '\n' }}{%- endif -%}{% else %}{{- tool_call -}}{%- endif %}{%- endfor %}{{- '</function_calls>' -}}{%- endif -%}{%- if not loop.last -%}{{- '<|im_end|>' + '\n' -}}{%- else -%}{{- eos_token -}}{%- endif -%}{%- elif message['role'] == 'environment' -%}{{- '<|im_start|>environment\n' + message['content'] + '<|im_end|>\n' -}}{%- elif message['role'] == 'tool' -%}{{- '<|im_start|>environment\n' + message['content'] + '<|im_end|>\n' -}}{%- endif -%}{%- if loop.last and add_generation_prompt -%}{{- '<|im_start|>assistant\\n' -}}{%- endif -%}{%- endfor -%}"
190
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff