dots.mocr-FP8
FP8-quantized version of rednote-hilab/dots.mocr.
This model was quantized with llm-compressor using FP8 dynamic activation quantization for the text backbone. The custom vision tower was intentionally excluded from quantization and kept in BF16.
Quantization details
- Base model:
rednote-hilab/dots.mocr - Quantization tool:
llm-compressor - Saved format:
compressed-tensors - Quantization scheme:
FP8_DYNAMIC - Targets:
Linear - Ignored modules:
lm_head.*vision_tower.*
Quantization recipe
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=[
"lm_head",
"re:.*vision_tower.*",
],
)
oneshot(model=model, recipe=recipe)
model.save_pretrained("binedge/dots.mocr-FP8", save_compressed=True)
processor.save_pretrained("binedge/dots.mocr-FP8")
- Downloads last month
- 549
Model tree for binedge/dots.mocr-FP8
Base model
rednote-hilab/dots.mocr