WeSpeaker-ResNet34-LM โ€” CoreML

CoreML conversion of WeSpeaker ResNet34-LM for Apple Neural Engine.

Produces 256-dimensional L2-normalized speaker embeddings from audio.

Model Details

Detail Value
Architecture ResNet34 with statistics pooling
Parameters ~6.6M
Input 80-bin log-mel spectrogram (16kHz)
Output 256-dim L2-normalized speaker embedding
BatchNorm Fused into Conv2d at conversion time

Usage

let model = try await WeSpeakerModel.fromPretrained(backend: .coreML)
let embedding = model.embed(audio: samples, sampleRate: 16000)
let similarity = WeSpeakerModel.cosineSimilarity(embeddingA, embeddingB)

Variants

Variant Backend Model ID
MLX GPU aufklarer/WeSpeaker-ResNet34-LM-MLX
CoreML Neural Engine aufklarer/WeSpeaker-ResNet34-LM-CoreML

Links

Downloads last month
139
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for aufklarer/WeSpeaker-ResNet34-LM-CoreML

Finetuned
(5)
this model