mlx-community/Qwen3.5-27B-heretic-8bit
This model was converted to MLX format from coder3101/Qwen3.5-27B-heretic using mlx-vlm version 0.3.12.
Qwen3.5-27B-heretic is a decensored/abliterated version of Qwen/Qwen3.5-27B, created using Heretic v1.2.0 with Magnitude-Preserving Orthogonal Ablation (MPOA).
Quantization Details
- Bits: 8
- Group size: 64
- Mode: affine
- Total size: ~29.5 GB
- Bits per weight: 8.627 (avg)
Vision encoder layers with dimensions incompatible with the group size are kept in bfloat16.
Key Features
- Vision + Text: Natively multimodal — accepts images and video as input
- Abliterated: Refusal rate reduced from 94% to 14% while maintaining low KL divergence (0.0653) from the original model
- Hybrid Architecture: Gated DeltaNet + Gated Attention layers (not standard transformer)
- Long Context: 262,144 token context window
- Thinking Mode: Supports
<think>reasoning (disabled by default in this conversion's chat template for faster interactive use; passenable_thinking=Trueto re-enable)
Use with mlx-vlm
pip install mlx-vlm
Text Generation
from mlx_vlm import load, generate
model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")
prompt = "Hey, what's up?"
output = generate(model, processor, prompt, max_tokens=500, temperature=0.7)
print(output)
Vision (Image Understanding)
from mlx_vlm import load, generate
model, processor = load("mlx-community/Qwen3.5-27B-heretic-8bit")
output = generate(
model, processor,
prompt="Describe this image in detail",
image="path/to/image.jpg",
max_tokens=500,
temperature=0.3,
)
print(output)
CLI
# Text chat
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
--prompt "Hello" --max-tokens 500 --chat
# Image + text
mlx_vlm generate --model mlx-community/Qwen3.5-27B-heretic-8bit \
--image photo.jpg --prompt "What do you see?" --max-tokens 500
Re-enabling Thinking Mode
The chat template defaults to thinking OFF for snappier interactive use. To re-enable <think> reasoning, pass enable_thinking=True when applying the chat template:
prompt = processor.tokenizer.apply_chat_template(
messages, add_generation_prompt=True, enable_thinking=True
)
Original Model
- Base: Qwen/Qwen3.5-27B
- Abliteration: coder3101/Qwen3.5-27B-heretic (Heretic v1.2.0, MPOA method)
- Parameters: 27B
- Architecture:
Qwen3_5ForConditionalGeneration(Vision-Language, Gated DeltaNet + Gated Attention)
- Downloads last month
- 2,081
Model size
8B params
Tensor type
BF16
·
U32 ·
Hardware compatibility
Log In to add your hardware
8-bit