Shade V5 β On-Device PII Detection
Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score.
Quick Start
pip install veil-phantom
from veil_phantom import VeilClient
veil = VeilClient() # auto-downloads this model
result = veil.redact("John Smith sent $5M to john@acme.com")
result.sanitized # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"
Model Details
| Property | Value |
|---|---|
| Architecture | DeBERTa-v3-xsmall |
| Parameters | 22M |
| Format | ONNX |
| Size | 270 MB |
| Inference | <50ms on CPU |
| F1 Score | 97.6% (in-distribution) |
| F1 Score | 97.3% (out-of-distribution) |
| Task | BIO Token Classification |
| Labels | 25 (12 entity types Γ B/I + O) |
Entity Types
| Type | F1 | Examples |
|---|---|---|
| PERSON | 96.3% | Names (Western, African, Asian, South African) |
| ORG | 97.6% | Companies, institutions |
| 100% | Email addresses | |
| PHONE | 98.4% | Phone numbers (international formats) |
| MONEY | 99.6% | Monetary amounts |
| DATE | 97.8% | Dates, times, schedules |
| ADDRESS | 99.4% | Street addresses |
| GOVID | 97.7% | SSN, SA ID, passport |
| BANKACCT | 92.9% | Bank account numbers, IBAN |
| CARD | 100% | Credit/debit card numbers |
| IPADDR | 100% | IP addresses |
| CASE | 97.8% | Legal case numbers |
Training
- Base model: microsoft/deberta-v3-xsmall
- Training data: 116K examples from business meetings, legal proceedings, financial transactions
- Tokenizer: Unigram (128K vocab)
- OOD gap: 0.3% (97.6% β 97.3%)
Files
ShadeV5.onnxβ ONNX model (270 MB)tokenizer.jsonβ HuggingFace fast tokenizertokenizer_config.jsonβ Tokenizer configurationshade_label_map.jsonβ BIO label β entity type mapping
License
Apache 2.0
Part of VeilPhantom
This model powers VeilPhantom, an open-source PII redaction SDK for agentic AI pipelines.