Legal BERT (Large) - Question Answering

This repository hosts legal-bert, an AI question-answering model fine-tuned specifically for the legal domain.

The model is based on the BERT-large architecture, starting from bert-large-uncased-whole-word-masking-finetuned-squad, and fine-tuned further on legal-domain text and question-answering pairs to adapt it to legal terminology, contract analysis, and legal query answering.

Model Description

  • Developed by: ketannnn
  • Model Type: BERT (Bidirectional Encoder Representations from Transformers)
  • Language: English
  • License: Apache-2.0
  • Finetuned from model: bert-large-uncased-whole-word-masking-finetuned-squad
  • Application: Question Answering / Contract Analysis / Legal Document Parsing

Intended Uses & Limitations

This model is intended to be used by legal tech professionals, researchers, and developers looking to automate question-answering tasks on legal texts, such as contracts, agreements, statutes, and case filings.

Limitations

  • The model is based on BERT, so it is limited to a maximum sequence length of 512 tokens. Long contracts or legal documents must be chunked or searched beforehand.
  • It is uncased, meaning it does not differentiate between "court" and "Court".
  • Output quality depends heavily on context relevance. It should be used as an assistive tool rather than a final legal authority.

How to Use

You can run local inference on CPU or GPU using the Hugging Face transformers library.

1. Using Hugging Face Pipelines (High-Level Helper)

from transformers import pipeline

# Initialize the pipeline
qa_pipeline = pipeline(
    "question-answering", 
    model="ketannnn/legal-bert"
)

# Define your legal context and question
context = (
    "This Agreement shall be governed by and construed in accordance with the laws "
    "of the State of Delaware, without giving effect to any choice of law or conflict "
    "of law provision. Any legal action arising hereunder must be filed in the courts "
    "of Wilmington, Delaware."
)
question = "Which state's laws govern the agreement?"

# Get the answer
result = qa_pipeline(question=question, context=context)
print(result)
# Output will contain the answer text, score, start and end token indices.

2. Loading Tokenizer and Model Directly

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ketannnn/legal-bert")
model = AutoModelForQuestionAnswering.from_pretrained("ketannnn/legal-bert")

# Prepare inputs
context = (
    "Under Section 8, the Executive shall receive a base salary of $250,000 per annum, "
    "payable in regular installments in accordance with the Company's standard payroll practices."
)
question = "What is the executive's annual base salary?"

inputs = tokenizer(question, context, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

# Extract answer span
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens)

print(f"Answer: {answer}")
Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ketannnn/legal-bert