Seeking Advice on Fine-Tuning a Legal Language Model for Nepalese Law (LLM + RAG)

Hi everyone, :waving_hand:

I’m working on building an AI-powered legal assistant focused on Nepalese law. My goal is to create a model that can provide legal advice by understanding and interpreting laws, acts, and judicial decisions in both Nepali and English.

Currently, I’m planning to use a combination of:

  • Fine-tuned LLMs (like Legal-BERT, mBERT, or GPT-2) for legal reasoning.
  • Retrieval-Augmented Generation (RAG) to pull up-to-date legal information (Constitution, Civil/Criminal codes, etc.) without needing constant retraining.

What I’ve done so far:

  • Collected legal texts: Constitution of Nepal (2072), Muluki Ain (2017), and other acts.
  • Started preparing a question-answer dataset for fine-tuning.
  • Exploring FAISS and LangChain for RAG implementation.

What I need help with:

  1. Model selection:
  • Would Legal-BERT be a good choice for fine-tuning legal Q&A, or should I use mBERT since my data involves both Nepali and English?
  • Is GPT-2 suitable for generating long-form legal explanations?
  1. RAG setup:
  • For a legal AI, would you recommend FAISS or ChromaDB for storing and retrieving legal document embeddings?
  • How can I balance retrieval accuracy with generation quality?
  1. Handling bilingual capabilities:
  • Should I fine-tune the model in Nepali directly, or train in English and use a translation layer for outputs?
  • Any suggestions for models like BLOOM or mBERT that support Nepali?
  1. Fine-tuning strategy:
  • For fine-tuning, should I use a SQuAD-style Q&A format or focus on situation-based legal questions?
  • Any best practices for avoiding hallucinations in legal answers?

I want to build a model that doesn’t just generate answers but cites the correct articles or acts — ensuring transparency and trust.

Would really appreciate your expert insights on how to refine this system, avoid pitfalls, and structure the pipeline efficiently. :folded_hands:

Thanks in advance — excited to hear your thoughts!

2 Likes

Hi! this sounds interesting, I am starting a project along the same lines with a smaller set of laws, is there anyway I can learn from your progress so far? I am a governance professional so not a technical person, so not aiming to copy your progress but rather understand how such a project is structured. Are you following any guides to design your project?

1 Like

I came across a research paper related to NepKanun where they proposed a fine tuned model on Neapli legal texts. Do you know anything about the model or where I could access them?

1 Like

It’s not the author’s model weight, but there might be a model you can use as-is.


The NepKanun work is real, but I cannot find official model weights from the authors. A community model that matches the paper’s setup is public on Hugging Face and works out of the box.

Background and context

  • Paper: “NepKanun: A RAG-Based Nepali Legal Assistant.” It fine-tunes Llama-3.2-3B with PEFT (LoRA/QLoRA via Unsloth) on ~10k curated Nepali legal Q/A pairs, then serves it inside a RAG pipeline. Reported BERTScore F1: 0.82 (simple), 0.77 (moderate), 0.71 (complex).
  • Venue page lists “publicly available software and/or pre-trained models,” but the page and PDF do not link any weights or code. I could not locate an official NepKanun repo or model card. (OpenReview)
  • Forum context you shared is active but also has no release links. (Hugging Face Forums)

What you can use today

  • vhab10/Llama-3.2-3B-Nepali-legal-QA-merged-16bit. Apache-2.0. Includes safetensors and tokenizer. Finetuned from unsloth/llama-3.2-3b-instruct-bnb-4bit. The card mentions Unsloth + TRL, matching the paper’s stack. I see no statement that it is the official NepKanun model, so treat it as a close community equivalent. (Hugging Face)
  • NyayaLM v0.5 (Gemma-3n-4B). Another Nepali legal model with weights and a dataset link. Useful as a baseline or second opinion. (Hugging Face)
  • Base family reference: Llama-3.2 overview and model card. (Hugging Face)

Quick start (Transformers)

# model card: vhab10/Llama-3.2-3B-Nepali-legal-QA-merged-16bit
# base family: meta-llama/Llama-3.2-3B
# deps: pip install transformers torch accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

name = "vhab10/Llama-3.2-3B-Nepali-legal-QA-merged-16bit"
tok = AutoTokenizer.from_pretrained(name)
model = AutoModelForCausalLM.from_pretrained(name, torch_dtype="auto", device_map="auto")

gen = pipeline("text-generation", model=model, tokenizer=tok)
prompt = "### प्रश्न: नेपालको संविधान २०७२ अनुसार मौलिक हक कुन-कुन हुन्?\n### उत्तर:"
print(gen(prompt, max_new_tokens=256, do_sample=False)[0]["generated_text"])

What to verify before adoption

  • Prompt template and chat format match your stack. Llama-3.2 instruct formatting is typical. (Hugging Face)
  • License fit (Apache-2.0 is permissive). (Hugging Face)
  • Basic eval on your own Nepali legal set; the NepKanun paper used ~10k Q/A and reported metrics above, but those are on their data.

If you want the “official” NepKanun weights and can’t find them, next steps are to comment on the OpenReview page asking for a release or contact the listed authors; until then, the vhab10 model and NyayaLM are the most practical options. (OpenReview)

Sources