aufklarer (Ivan)

posted an update about 5 hours ago

Post

20

We benchmarked https://github.com/soniqo/speech-swift, our open-source Swift library for on-device speech AI, against Whisper Large v3 (FP16) on LibriSpeech test-clean.

Three models beat it. Two architectural approaches:

Qwen3-ASR (LALM — Qwen3 LLM as ASR decoder, AuT encoder pretrained on ~40M hours) hits 2.35% WER at 1.7B 8-bit, running at 43x real-time on MLX. Greedy decoding matches beam search — the LLM decoder is strong enough that the greedy path is nearly always optimal.

Parakeet TDT (non-autoregressive transducer — FastConformer + TDT joint network) hits 2.74% WER in 634 MB as a CoreML INT8 model on the Neural Engine. No generative hallucination by design. Leaves GPU completely free.

Two findings worth flagging:
- 4-bit quantization is catastrophic for non-English: Korean 6.89% → 19.95% WER on FLEURS. Use 8-bit for multilingual.
- On CoreML, INT8 is 3.3x *faster* than INT4 — opposite of GPU behavior. Native ANE INT8 MACs vs INT4 lookup table indirection.

All numbers reproducible in 15 minutes.

Full article: https://blog.ivan.digital/we-beat-whisper-large-v3-with-a-600m-model-running-entirely-on-your-mac-20e6ce191174

Library: https://github.com/soniqo/speech-swift

Models: Qwen/Qwen3-ASR-0.6B, Qwen/Qwen3-ASR-1.7B, nvidia/parakeet-tdt-0.6b-v2

posted an update 22 days ago

Post

461

Speaker Diarization and VAD on Apple Silicon — MLX-Native Models

Three MLX-optimized models for on-device speaker diarization and voice activity detection, running natively on Apple Silicon via https://github.com/ivan-digital/qwen3-asr-swift:

- aufklarer/Silero-VAD-v5-MLX — Streaming VAD, 309K params, ~1.2 MB. Processes 32ms chunks at 23× real-time on M2 Max.
- aufklarer/Pyannote-Segmentation-MLX — Multi-speaker segmentation, ~1.49M params, ~5.7 MB. 7-class powerset output for up to 3 simultaneous speakers.
- aufklarer/WeSpeaker-ResNet34-LM-MLX — Speaker embedding, ~6.6M params, ~25 MB. 256-dim L2-normalized vectors with BatchNorm fused into Conv2d.

Together they form a diarization pipeline: pyannote segments → WeSpeaker embeds → agglomerative clustering links speakers across the recording. ~32 MB total.

git clone https://github.com/ivan-digital/qwen3-asr-swift
cd qwen3-asr-swift && swift build -c release

.build/release/audio diarize meeting.wav --max-speakers 4 --json
.build/release/audio vad-stream recording.wav

The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0.

Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx

Library: https://github.com/ivan-digital/qwen3-asr-swift

posted an update 25 days ago

Post

2517

PersonaPlex-7B on Apple Silicon (Swift + MLX Swift)

NVIDIA PersonaPlex is a full-duplex speech-to-speech model — it can listen while it speaks, which enables more natural conversational behaviors like interruptions, overlaps, and quick backchannels.

We put together a native Swift implementation using MLX Swift so it can run locally on Apple Silicon, along with a 4-bit MLX conversion and a small CLI/demo to make it easy to try out.

If you’re interested in on-device voice agents (or just want to see what full-duplex S2S looks like in a real Swift codebase), the details and setup notes are here:

Blog post: https://blog.ivan.digital/nvidia-personaplex-7b-on-apple-silicon-full-duplex-speech-to-speech-in-native-swift-with-mlx-0aa5276f2e23

Repo: https://github.com/ivan-digital/qwen3-asr-swift

reactedto their post with 🔥 about 1 month ago

Post

3459

Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262

2 replies

·

posted an update about 1 month ago

Post

3459

Context Engineering for Code Agents: Why They Fail and How to Fix Them

Code agents don't fail because they can't code — they fail because their context turns into a junk drawer.

I wrote a practical survey covering the emerging discipline of context engineering for agentic hybrid applications: the techniques, papers, and architectural patterns that keep long-running code agents on track as their token windows fill up with tool logs, stale diffs, and repeated file dumps.
What's covered:

Why long context windows alone don't save you (position bias, distractor sensitivity)
Observation masking vs. LLM summarization — and when simple beats clever
Tool-output compression with approaches like LLMLingua-2
Trajectory reduction: pruning dead branches from agent history
Memory hierarchies: session → working set → notes → cross-session
How MCP and standardized tool interfaces reduce context debt
Dynamic context policies trained with RL (DeepMiner, MEM1)
Meta-agent CI loops for measuring regressions across agent configs

The core argument: the engineering challenge isn't "make the model smarter" — it's make the agent's context and verification smarter. That's where the real leverage is in 2026.

👉 Read the full post: https://blog.ivan.digital/context-engineering-for-agentic-hybrid-applications-why-code-agents-fail-and-how-to-fix-them-076cab699262

2 replies

·

posted an update about 1 month ago

Post

798

Qwen3-ASR Swift: On-Device Speech Recognition for Apple Silicon

I'm excited to release https://github.com/ivan-digital/qwen3-asr-swift, an open-source Swift implementation of Alibaba's
Qwen3-ASR, optimized for Apple Silicon using MLX.

Why Qwen3-ASR? Exceptional noise robustness — 3.5x better than Whisper in noisy conditions (17.9% vs 63% CER).

Features:
- 52 languages (30 major + 22 Chinese dialects)
- ~600MB model (4-bit quantized)
- ~100ms latency on M-series chips
- Fully local, no cloud API

Also more inference and model architecture in blog post https://blog.ivan.digital/qwen3-asr-swift-on-device-asr-tts-for-apple-silicon-architecture-and-benchmarks-27cbf1e4463f

repliedto their post 3 months ago

I hear you — working code and not burning through your quota matter way more than any internal architecture diagram.

I’m looking at the architecture mainly to understand why some tools feel more reliable and easier to debug than others.

What’s been your experience so far — any setup that actually feels close to “it just works”?

reactedto their post with 🔥 4 months ago

Post

626

I did deep dive comparison of Claude Code vs OpenAI Codex code agents architectures, interesting what is your personal experience on this?

Both Claude Code and OpenAI Codex are built on the same backbone: a single-agent event loop that repeatedly thinks, calls tools, inspects the result, and repeats until it’s done. No swarms, no hidden graph orchestration — just one reflective agent iterating through a ReAct-style cycle.

https://blog.ivan.digital/claude-code-vs-openai-codex-agentic-planner-vs-shell-first-surgeon-d6ce988526e8

3 replies

·

reactedto their post with 🔥 4 months ago

Post

1456

Couple months ago I fine‑tuned Qwen3 Embeddings with LoRA on the LSPC dataset. This time I went the opposite way: a small, task‑specific 80M encoder with bidirectional attention, trained end‑to‑end. It outperforms the Qwen3 LoRA baseline on the same data (0.9315 macro‑F1 vs 0.8360). Details and code: https://blog.ivan.digital/beating-qwen3-lora-with-a-tiny-pytorch-encoder-on-the-large-scale-product-corpus-afe536de205f

posted an update 4 months ago

Post

626

I did deep dive comparison of Claude Code vs OpenAI Codex code agents architectures, interesting what is your personal experience on this?

Both Claude Code and OpenAI Codex are built on the same backbone: a single-agent event loop that repeatedly thinks, calls tools, inspects the result, and repeats until it’s done. No swarms, no hidden graph orchestration — just one reflective agent iterating through a ReAct-style cycle.

https://blog.ivan.digital/claude-code-vs-openai-codex-agentic-planner-vs-shell-first-surgeon-d6ce988526e8

3 replies

·

posted an update 4 months ago

Post

1456

Couple months ago I fine‑tuned Qwen3 Embeddings with LoRA on the LSPC dataset. This time I went the opposite way: a small, task‑specific 80M encoder with bidirectional attention, trained end‑to‑end. It outperforms the Qwen3 LoRA baseline on the same data (0.9315 macro‑F1 vs 0.8360). Details and code: https://blog.ivan.digital/beating-qwen3-lora-with-a-tiny-pytorch-encoder-on-the-large-scale-product-corpus-afe536de205f

reactedto their post with 🔥 4 months ago

Post

3276

Fine-Tuning Qwen3 Embeddings for product category classification on the Large-Scale Product Corpus

Language-models such as GPT, Llama, DeepSeek, Qwen trained with a filtered slice of Common Crawl. For e-commerce work, though, we can start with the Web Data Commons (WDC), the project by the University of Mannheim. It extracts web pages that carry some metadata and publishes the result as the Large-Scale Product Corpus (LSPC).

Search engines like Google reward pages that include detailed product markup, so merchants already populate their sites with SEO-friendly fields such as title, brand, GTIN, price — and, crucially, category labels. Thanks to these built-in annotations, the WDC Large-Scale Product Corpus arrives almost fully self-labelled. I used those labels to fine-tune Qwen3 Embedding with Low-Rank Adaptation (LoRA), code is available on github. The resulting 615 million-parameter checkpoint fits comfortably in limited GPU memory yet updates the model’s representation space, mapping raw product titles to six top-level categories with a macro-F1 of 0.836 (83.6 %).

More details: https://blog.ivan.digital/fine-tuning-qwen3-embeddings-for-product-category-classification-on-the-large-scale-product-corpus-3a0919506bc8

posted an update 4 months ago

Post

3276

Fine-Tuning Qwen3 Embeddings for product category classification on the Large-Scale Product Corpus

Language-models such as GPT, Llama, DeepSeek, Qwen trained with a filtered slice of Common Crawl. For e-commerce work, though, we can start with the Web Data Commons (WDC), the project by the University of Mannheim. It extracts web pages that carry some metadata and publishes the result as the Large-Scale Product Corpus (LSPC).

Search engines like Google reward pages that include detailed product markup, so merchants already populate their sites with SEO-friendly fields such as title, brand, GTIN, price — and, crucially, category labels. Thanks to these built-in annotations, the WDC Large-Scale Product Corpus arrives almost fully self-labelled. I used those labels to fine-tune Qwen3 Embedding with Low-Rank Adaptation (LoRA), code is available on github. The resulting 615 million-parameter checkpoint fits comfortably in limited GPU memory yet updates the model’s representation space, mapping raw product titles to six top-level categories with a macro-F1 of 0.836 (83.6 %).

More details: https://blog.ivan.digital/fine-tuning-qwen3-embeddings-for-product-category-classification-on-the-large-scale-product-corpus-3a0919506bc8

Ivan PRO

AI & ML interests

Recent Activity

Organizations

Ivan PRO

AI & ML interests

Recent Activity

Organizations

aufklarer's activity