Clear: on-device speech enhancement
48 kHz on-device speech enhancement. Takes noisy mono or stereo audio (phone mic, untreated room, traffic), returns a podcast-ready file: denoised, dereverbed, voice warm and present.
Try it
- Live demo: desert-ant-labs/clear-demo โ drop in a recording and hear raw vs cleaned, fully in your browser.
- iOS / macOS:
clear-swiftโ Swift package; both variants bundled, works offline. - Android / JVM:
clear-kotlinโ Kotlin SDK via JitPack. - JavaScript / TypeScript:
@desert-ant-labs/clearโ npm package for Node + browser (source).
For commercial licensing above 100k MAU, email licensing@desertant.ai.
Variants
| Variant | Character | When to use |
|---|---|---|
clear-studio |
Quiet, studio-like; silences near zero | Default. Works across the full range of input quality: phone audio, laptop mic, untreated rooms, USB / XLR podcast captures. |
clear-natural |
Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional. |
If the source is already clean and you want the model to stay invisible, pick clear-natural. Otherwise clear-studio is the default.
Files
Both variants share the same architecture and realtime cost; only the weights differ.
Both variants are 6-bit palettized (k-means LUT) โ ~5ร smaller than the fp32 weights with no perceptible quality loss (DNSMOS OVRL within ~0.02 of the float model).
| Variant | File | Format | Size |
|---|---|---|---|
clear-studio |
clear-studio.mlmodelc/ |
Core ML, 6-bit palettized, precompiled | ~1.9 MB |
clear-studio |
clear-studio.onnx |
ONNX, 6-bit palettized (fp16-stored) | ~4.5 MB |
clear-natural |
clear-natural.mlmodelc/ |
Core ML, 6-bit palettized, precompiled | ~1.9 MB |
clear-natural |
clear-natural.onnx |
ONNX, 6-bit palettized (fp16-stored) | ~4.5 MB |
The ONNX keeps fp32 inputs/outputs, so host code is unchanged. The Core ML .mlmodelc is precompiled (load it directly; no .mlpackage compile step).
Use
ONNX
from huggingface_hub import hf_hub_download
import onnxruntime as ort
path = hf_hub_download("desert-ant-labs/clear", "clear-studio.onnx")
session = ort.InferenceSession(path, providers=["CPUExecutionProvider"])
Inputs and outputs
- Architecture: DeepFilterNet 3 (DFN3-half).
- Sample rate: 48 kHz, mono or stereo (per-channel inference).
- Inference contract:
spec/feat_erb/feat_specโspec_enhanced. STFT, ERB, and ISTFT are host-side DSP, not part of the model graph.
Performance
Both variants run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:
| Device | Chip | Mono | Stereo |
|---|---|---|---|
| iPhone 15 Pro | A17 Pro | 4.88 s (61ร realtime) | 6.53 s (46ร) |
| iPhone 17 Pro | A19 Pro | 3.70 s (81ร realtime) | 5.16 s (58ร) |
Cold model load is ~0.6 s; later loads ~100 ms via the system ANE cache.
Limitations
- Trained on English speech; non-English speech still benefits but has not been measured against per-language ground truth.
- Heavy background music or multi-speaker overlap degrades quality.
- Mastering is informational only; verify against the platform's actual loudness target before publishing.
Built on
- DeepFilterNet 3 by Rikorose, MIT. Fine-tuned on the Desert Ant Labs speech corpus.
License
Released under the Desert Ant Labs Source-Available License v1.0 (see LICENSE.md).
- Free for commercial use up to 100,000 Monthly Active Users (MAU).
- Above 100,000 MAU a commercial license is required. Contact licensing@desertant.ai.
Citation
@software{clear_2026,
title = {Clear: on-device speech enhancement},
author = {Desert Ant Labs},
year = {2026},
url = {https://huggingface.co/desert-ant-labs/clear},
}
ยฉ 2026 Desert Ant Labs ยท https://desertant.ai