Legal BERT (Large) - Question Answering

This repository hosts legal-bert, an AI question-answering model fine-tuned specifically for the legal domain.

The model is based on the BERT-large architecture, starting from bert-large-uncased-whole-word-masking-finetuned-squad, and fine-tuned further on legal-domain text and question-answering pairs to adapt it to legal terminology, contract analysis, and legal query answering.

Model Description

Developed by: ketannnn
Model Type: BERT (Bidirectional Encoder Representations from Transformers)
Language: English
License: Apache-2.0
Finetuned from model: bert-large-uncased-whole-word-masking-finetuned-squad
Application: Question Answering / Contract Analysis / Legal Document Parsing

Intended Uses & Limitations

This model is intended to be used by legal tech professionals, researchers, and developers looking to automate question-answering tasks on legal texts, such as contracts, agreements, statutes, and case filings.

Limitations

The model is based on BERT, so it is limited to a maximum sequence length of 512 tokens. Long contracts or legal documents must be chunked or searched beforehand.
It is uncased, meaning it does not differentiate between "court" and "Court".
Output quality depends heavily on context relevance. It should be used as an assistive tool rather than a final legal authority.

How to Use

You can run local inference on CPU or GPU using the Hugging Face transformers library.

1. Using Hugging Face Pipelines (High-Level Helper)

from transformers import pipeline

# Initialize the pipeline
qa_pipeline = pipeline(
    "question-answering", 
    model="ketannnn/legal-bert"
)

# Define your legal context and question
context = (
    "This Agreement shall be governed by and construed in accordance with the laws "
    "of the State of Delaware, without giving effect to any choice of law or conflict "
    "of law provision. Any legal action arising hereunder must be filed in the courts "
    "of Wilmington, Delaware."
)
question = "Which state's laws govern the agreement?"

# Get the answer
result = qa_pipeline(question=question, context=context)
print(result)
# Output will contain the answer text, score, start and end token indices.

2. Loading Tokenizer and Model Directly

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ketannnn/legal-bert")
model = AutoModelForQuestionAnswering.from_pretrained("ketannnn/legal-bert")

# Prepare inputs
context = (
    "Under Section 8, the Executive shall receive a base salary of $250,000 per annum, "
    "payable in regular installments in accordance with the Company's standard payroll practices."
)
question = "What is the executive's annual base salary?"

inputs = tokenizer(question, context, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

# Extract answer span
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()

predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens)

print(f"Answer: {answer}")

Downloads last month: 41

Model tree for ketannnn/legal-bert

Base model

google-bert/bert-large-uncased-whole-word-masking-finetuned-squad

Finetuned

(43)

this model