mlx-community/Qwen3.5-27B-heretic-8bit

This model was converted to MLX format from coder3101/Qwen3.5-27B-heretic using mlx-vlm version 0.3.12.

Qwen3.5-27B-heretic is a decensored/abliterated version of Qwen/Qwen3.5-27B, created using Heretic v1.2.0 with Magnitude-Preserving Orthogonal Ablation (MPOA).

Quantization Details

  • Bits: 8
  • Group size: 64
  • Mode: affine
  • Total size: ~29.5 GB
  • Bits per weight: 8.627 (avg)

Vision encoder layers with dimensions incompatible with the group size are kept in bfloat16.

Key Features

  • Vision + Text: Natively multimodal — accepts images and video as input
  • Abliterated: Refusal rate reduced from 94% to 14% while maintaining low KL divergence (0.0653) from the original model
  • Hybrid Architecture: Gated DeltaNet + Gated Attention layers (not standard transformer)
  • Long Context: 262,144 token context window
  • Thinking Mode: Supports <think> reasoning (disabled by default in this conversion's chat template for faster interactive use; pass enable_thinking=True to re-enable)

Use with mlx-vlm

pip install mlx-vlm

Text Generation

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")

prompt = "Hey, what's up?"
output = generate(model, processor, prompt, max_tokens=500, temperature=0.7)
print(output)

Vision (Image Understanding)

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")

output = generate(
    model, processor,
    prompt="Describe this image in detail",
    image="path/to/image.jpg",
    max_tokens=500,
    temperature=0.3,
)
print(output)

CLI

# Text chat
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
    --prompt "Hello" --max-tokens 500 --chat

# Image + text
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
    --image photo.jpg --prompt "What do you see?" --max-tokens 500

Re-enabling Thinking Mode

The chat template defaults to thinking OFF for snappier interactive use. To re-enable <think> reasoning, pass enable_thinking=True when applying the chat template:

prompt = processor.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, enable_thinking=True
)

Original Model

Downloads last month
2,081
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/Qwen3.5-27B-heretic-8bit

Base model

Qwen/Qwen3.5-27B
Quantized
(13)
this model