Shade V5 β€” On-Device PII Detection

Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score.

Quick Start

pip install veil-phantom
from veil_phantom import VeilClient

veil = VeilClient()  # auto-downloads this model
result = veil.redact("John Smith sent $5M to john@acme.com")
result.sanitized  # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"

Model Details

Property Value
Architecture DeBERTa-v3-xsmall
Parameters 22M
Format ONNX
Size 270 MB
Inference <50ms on CPU
F1 Score 97.6% (in-distribution)
F1 Score 97.3% (out-of-distribution)
Task BIO Token Classification
Labels 25 (12 entity types Γ— B/I + O)

Entity Types

Type F1 Examples
PERSON 96.3% Names (Western, African, Asian, South African)
ORG 97.6% Companies, institutions
EMAIL 100% Email addresses
PHONE 98.4% Phone numbers (international formats)
MONEY 99.6% Monetary amounts
DATE 97.8% Dates, times, schedules
ADDRESS 99.4% Street addresses
GOVID 97.7% SSN, SA ID, passport
BANKACCT 92.9% Bank account numbers, IBAN
CARD 100% Credit/debit card numbers
IPADDR 100% IP addresses
CASE 97.8% Legal case numbers

Training

  • Base model: microsoft/deberta-v3-xsmall
  • Training data: 116K examples from business meetings, legal proceedings, financial transactions
  • Tokenizer: Unigram (128K vocab)
  • OOD gap: 0.3% (97.6% β†’ 97.3%)

Files

  • ShadeV5.onnx β€” ONNX model (270 MB)
  • tokenizer.json β€” HuggingFace fast tokenizer
  • tokenizer_config.json β€” Tokenizer configuration
  • shade_label_map.json β€” BIO label β†’ entity type mapping

License

Apache 2.0

Part of VeilPhantom

This model powers VeilPhantom, an open-source PII redaction SDK for agentic AI pipelines.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support