MuRIL Indian Address NER v1
Fine-tuned google/muril-base-cased for
Indian address component detection in Hindi (Devanagari), English, and Hinglish
(Roman-script Hindi-English code-mix).
Labels
| Tag | Meaning | Example |
|---|---|---|
ADDRESS_HOUSE |
House / flat / door / plot number | "H.No. 12", "मकान नं. 21", "Flat 4B" |
ADDRESS_BUILDING |
Building / apartment / society name | "Prestige Residency" |
ADDRESS_STREET |
Street, road, lane, gali, marg | "MG Road", "गली नं. 4" |
ADDRESS_LANDMARK |
Landmark anchor ("near / opposite X") | "near Apollo Hospital", "मंदिर के पास" |
ADDRESS_LOCALITY |
Area, colony, nagar, mohalla, sector | "Koramangala", "Gandhi Nagar" |
ADDRESS_CITY |
City or town | "Bengaluru", "बेंगलुरु" |
ADDRESS_STATE |
State or union territory | "Karnataka", "कर्नाटक" |
ADDRESS_PIN |
6-digit Indian PIN code (optional) | "560001" |
PIN code is not required — the model recognises addresses without a PIN.
Performance (Benchmark v1 — 2026-04-20)
Evaluated on a held-out slice of data/generated/address_benchmark_v1 (synthetic Indian
addresses across Devanagari, English, and Hinglish).
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| ADDRESS_LANDMARK | 0.998 | 1.000 | 0.999 |
| ADDRESS_STREET | 0.996 | 1.000 | 0.998 |
| ADDRESS_HOUSE | 0.995 | 1.000 | 0.998 |
| ADDRESS_PIN | 0.995 | 1.000 | 0.997 |
| ADDRESS_BUILDING | 0.991 | 1.000 | 0.996 |
| ADDRESS_CITY | 0.968 | 0.981 | 0.974 |
| ADDRESS_STATE | 0.752 | 0.858 | 0.801 |
| ADDRESS_LOCALITY | 0.702 | 0.851 | 0.769 |
| Overall | 0.864 | 0.925 | 0.815 |
Training: A100-SXM4-40GB, 4 epochs, 26,728 examples, no overfitting detected.
Limitations
- Trained on synthetic data only (v1). Real-world performance will improve in v2 after
adding
ai4bharat/naamapadamandlince-benchmark/lincesupervision. ADDRESS_LOCALITYprecision is lower than other entities (0.702) — the model over-predicts locality in Devanagari prose outside address context.- Coverage is limited to the in-repo gazetteer (~50 cities, 33 states, 100+ localities).
Usage
from transformers import pipeline
ner = pipeline(
"token-classification",
model="mukuls9971/muril-indian-address-ner-v1",
aggregation_strategy="simple",
)
results = ner("H.No. 12, MG Road, Koramangala, Bengaluru - 560034")
# or Devanagari
results = ner("मकान नं. 21, गांधी नगर, भोपाल - 462001")
# or Hinglish
results = ner("Makan No. 4, Gandhi Nagar ke paas, Bhopal")
Training Details
- Base model:
google/muril-base-cased - Dataset: Synthetic Indian address corpus v1 (seed 42)
- Epochs: 4, batch size 8, max_length 192
- Learning rate: 2e-5, warmup ratio 0.1, weight decay 0.01
- Weighted loss: enabled (class imbalance handling)
- Run ID:
20260420_030651_muril-address-benchmark-v1
- Downloads last month
- 18
Evaluation results
- Overall F1 on Synthetic Indian Address v1self-reported0.815
- Precision on Synthetic Indian Address v1self-reported0.864
- Recall on Synthetic Indian Address v1self-reported0.925