Shade V5 — On-Device PII Detection

Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score.

Quick Start

pip install veil-phantom

from veil_phantom import VeilClient

veil = VeilClient()  # auto-downloads this model
result = veil.redact("John Smith sent $5M to john@acme.com")
result.sanitized  # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"

Model Details

Property	Value
Architecture	DeBERTa-v3-xsmall
Parameters	22M
Format	ONNX
Size	270 MB
Inference	<50ms on CPU
F1 Score	97.6% (in-distribution)
F1 Score	97.3% (out-of-distribution)
Task	BIO Token Classification
Labels	25 (12 entity types × B/I + O)

Entity Types

Type	F1	Examples
PERSON	96.3%	Names (Western, African, Asian, South African)
ORG	97.6%	Companies, institutions
EMAIL	100%	Email addresses
PHONE	98.4%	Phone numbers (international formats)
MONEY	99.6%	Monetary amounts
DATE	97.8%	Dates, times, schedules
ADDRESS	99.4%	Street addresses
GOVID	97.7%	SSN, SA ID, passport
BANKACCT	92.9%	Bank account numbers, IBAN
CARD	100%	Credit/debit card numbers
IPADDR	100%	IP addresses
CASE	97.8%	Legal case numbers

Training

Base model: microsoft/deberta-v3-xsmall
Training data: 116K examples from business meetings, legal proceedings, financial transactions
Tokenizer: Unigram (128K vocab)
OOD gap: 0.3% (97.6% → 97.3%)

Files

ShadeV5.onnx — ONNX model (270 MB)
tokenizer.json — HuggingFace fast tokenizer
tokenizer_config.json — Tokenizer configuration
shade_label_map.json — BIO label → entity type mapping

License

Apache 2.0

Part of VeilPhantom

This model powers VeilPhantom, an open-source PII redaction SDK for agentic AI pipelines.

Downloads last month: -; Downloads are not tracked for this model. How to track