philipp-zettl/modernbert-diffusion-instruct
Fill-Mask • 0.1B • Updated
Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/
Note "base" model trained on HuggingFaceH4/ultrachat_200k
Note "base" model trained on bigcode/the-stack-dedup (python)
Note "base" model trained on multi purpose datasets (all the above + bigcode/the-stack-dedup (json) and fineweb-edu)
Note FT of philipp-zettl/modernbert-diffusion-universal using tatsu-lab/alpaca
Note FT on Skylion007/openwebtext
Note FT on bigcode/the-stack-dedup (python)