Instructions to use allenai/dolma2-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use allenai/dolma2-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("allenai/dolma2-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Slightly modified version of cl100k_base that supports Dolma 1.x special tokens
(|||PHONE_NUMBER|||, |||EMAIL_ADDRESS|||, |||IP_ADDRESS|||) as well as adds
extra tokens to fill gaps in tiktoken cl100k_base version.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support