mlx-community/Qwen3.5-27B-heretic-8bit

This model was converted to MLX format from coder3101/Qwen3.5-27B-heretic using mlx-vlm version 0.3.12.

Qwen3.5-27B-heretic is a decensored/abliterated version of Qwen/Qwen3.5-27B, created using Heretic v1.2.0 with Magnitude-Preserving Orthogonal Ablation (MPOA).

Quantization Details

Bits: 8
Group size: 64
Mode: affine
Total size: ~29.5 GB
Bits per weight: 8.627 (avg)

Vision encoder layers with dimensions incompatible with the group size are kept in bfloat16.

Key Features

Vision + Text: Natively multimodal — accepts images and video as input
Abliterated: Refusal rate reduced from 94% to 14% while maintaining low KL divergence (0.0653) from the original model
Hybrid Architecture: Gated DeltaNet + Gated Attention layers (not standard transformer)
Long Context: 262,144 token context window
Thinking Mode: Supports <think> reasoning (disabled by default in this conversion's chat template for faster interactive use; pass enable_thinking=True to re-enable)

Use with mlx-vlm

pip install mlx-vlm

Text Generation

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")

prompt = "Hey, what's up?"
output = generate(model, processor, prompt, max_tokens=500, temperature=0.7)
print(output)

Vision (Image Understanding)

from mlx_vlm import load, generate

model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")

output = generate(
    model, processor,
    prompt="Describe this image in detail",
    image="path/to/image.jpg",
    max_tokens=500,
    temperature=0.3,
)
print(output)

CLI

# Text chat
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
    --prompt "Hello" --max-tokens 500 --chat

# Image + text
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
    --image photo.jpg --prompt "What do you see?" --max-tokens 500

Re-enabling Thinking Mode

The chat template defaults to thinking OFF for snappier interactive use. To re-enable <think> reasoning, pass enable_thinking=True when applying the chat template:

prompt = processor.tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, enable_thinking=True
)

Original Model

Base: Qwen/Qwen3.5-27B
Abliteration: coder3101/Qwen3.5-27B-heretic (Heretic v1.2.0, MPOA method)
Parameters: 27B
Architecture: Qwen3_5ForConditionalGeneration (Vision-Language, Gated DeltaNet + Gated Attention)

Downloads last month: 2,081

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for mlx-community/Qwen3.5-27B-heretic-8bit

Base model

Qwen/Qwen3.5-27B

Finetuned

coder3101/Qwen3.5-27B-heretic

Quantized

(13)

this model