Instructions to use ketannnn/legal-bert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ketannnn/legal-bert with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="ketannnn/legal-bert")# Load model directly from transformers import AutoTokenizer, AutoModelForQuestionAnswering tokenizer = AutoTokenizer.from_pretrained("ketannnn/legal-bert") model = AutoModelForQuestionAnswering.from_pretrained("ketannnn/legal-bert") - Notebooks
- Google Colab
- Kaggle
Legal BERT (Large) - Question Answering
This repository hosts legal-bert, an AI question-answering model fine-tuned specifically for the legal domain.
The model is based on the BERT-large architecture, starting from bert-large-uncased-whole-word-masking-finetuned-squad, and fine-tuned further on legal-domain text and question-answering pairs to adapt it to legal terminology, contract analysis, and legal query answering.
Model Description
- Developed by: ketannnn
- Model Type: BERT (Bidirectional Encoder Representations from Transformers)
- Language: English
- License: Apache-2.0
- Finetuned from model:
bert-large-uncased-whole-word-masking-finetuned-squad - Application: Question Answering / Contract Analysis / Legal Document Parsing
Intended Uses & Limitations
This model is intended to be used by legal tech professionals, researchers, and developers looking to automate question-answering tasks on legal texts, such as contracts, agreements, statutes, and case filings.
Limitations
- The model is based on BERT, so it is limited to a maximum sequence length of 512 tokens. Long contracts or legal documents must be chunked or searched beforehand.
- It is uncased, meaning it does not differentiate between "court" and "Court".
- Output quality depends heavily on context relevance. It should be used as an assistive tool rather than a final legal authority.
How to Use
You can run local inference on CPU or GPU using the Hugging Face transformers library.
1. Using Hugging Face Pipelines (High-Level Helper)
from transformers import pipeline
# Initialize the pipeline
qa_pipeline = pipeline(
"question-answering",
model="ketannnn/legal-bert"
)
# Define your legal context and question
context = (
"This Agreement shall be governed by and construed in accordance with the laws "
"of the State of Delaware, without giving effect to any choice of law or conflict "
"of law provision. Any legal action arising hereunder must be filed in the courts "
"of Wilmington, Delaware."
)
question = "Which state's laws govern the agreement?"
# Get the answer
result = qa_pipeline(question=question, context=context)
print(result)
# Output will contain the answer text, score, start and end token indices.
2. Loading Tokenizer and Model Directly
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("ketannnn/legal-bert")
model = AutoModelForQuestionAnswering.from_pretrained("ketannnn/legal-bert")
# Prepare inputs
context = (
"Under Section 8, the Executive shall receive a base salary of $250,000 per annum, "
"payable in regular installments in accordance with the Company's standard payroll practices."
)
question = "What is the executive's annual base salary?"
inputs = tokenizer(question, context, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
# Extract answer span
answer_start_index = outputs.start_logits.argmax()
answer_end_index = outputs.end_logits.argmax()
predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
answer = tokenizer.decode(predict_answer_tokens)
print(f"Answer: {answer}")
- Downloads last month
- 41