⚠️ DEPRECATED — This model used a non-standard training methodology (Focal Loss + class weighting) and was evaluated on a synthetic test set. For accurate, reproducible results on the official LexGLUE benchmark, use our updated models:

Model μ-F1 m-F1

lexglue-roberta-unfair-tos 96.1 84.4

lexglue-legalbert-unfair-tos 96.0 84.1

lexglue-deberta-unfair-tos 95.6 82.2

lexglue-legalbert-small-unfair-tos 95.0 78.5

Model	μ-F1	m-F1
lexglue-roberta-unfair-tos	96.1	84.4
lexglue-legalbert-unfair-tos	96.0	84.1
lexglue-deberta-unfair-tos	95.6	82.2
lexglue-legalbert-small-unfair-tos	95.0	78.5

deberta-unfair-tos-augmented

Best performing model - DeBERTa trained with augmented data for UNFAIR-ToS classification

Model Description

This model is fine-tuned on the LexGLUE UNFAIR-ToS dataset to detect unfair clauses in Terms of Service documents.

Base Model: microsoft/deberta-base

Performance

Evaluation Metrics:

Exact Match Accuracy: Percentage of samples where all predicted labels exactly match ground truth (strict multi-label metric)
Micro-F1: Harmonic mean of precision and recall, aggregated across all labels

Metric	Score
Exact Match Accuracy	94.12%
Micro-F1	0.96
Micro-Precision	0.98

Risk Categories

The model classifies text into 8 risk categories:

ID	Category
0	Limitation of liability
1	Unilateral termination
2	Unilateral change
3	Content removal
4	Contract by using
5	Choice of law
6	Jurisdiction
7	Arbitration

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Agreemind/deberta-unfair-tos-augmented"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "We reserve the right to terminate your account at any time."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)

# Get predictions
labels = ["Limitation of liability", "Unilateral termination", "Unilateral change", 
          "Content removal", "Contract by using", "Choice of law", "Jurisdiction", "Arbitration"]
          
for label, prob in zip(labels, probs[0]):
    if prob > 0.5:
        print(f"{label}: {prob:.2%}")

Training

Parameter	Value
Dataset	`coastalcph/lex_glue` (`unfair_tos` subset)
Training Samples	~5,532
Loss Function	Focal Loss with class weighting
Optimizer	AdamW with cosine LR schedule
Learning Rate	2e-5 with 10% warmup
Epochs	15 (with early stopping, patience=3)

Limitations

Arbitration class has lower recall (~38%) due to limited training samples
Optimized for English legal text

Citation

@misc{agreemind-unfair-tos,
  author = {Agreemind},
  title = {deberta-unfair-tos-augmented},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/Agreemind/deberta-unfair-tos-augmented}
}

Downloads last month: 10

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Agreemind/deberta-unfair-tos-augmented

Base model

microsoft/deberta-base

Finetuned

(70)

this model

Agreemind
/

deberta-unfair-tos-augmented