Isaacus

Isaacus just shipped a new state-of-the-art model, this time focused on reranking for legal RAG.

Although Kanon 2 Embedder already represents the frontier of legal-domain retrieval, we know that not everyone is ready to re-embed their entire corpus. We also knew there was still accuracy left on the table for teams handling highly sensitive legal work.

Enter Kanon 2 Reranker: the world’s best legal reranking model.

We tested it across both production RAG pipelines and standalone retrieval tasks, and the results were remarkable.

Not only does it outperform the competition in a category where there are still very few serious alternatives, it also delivers major retrieval accuracy gains over our standalone embedder. Those improvements translated into exceptional downstream performance.

In our final test, we compared Voyage AI by MongoDB 2.5 Rerank with Kanon 2 Reranker on Legal RAG Bench, using identical embedding models, generative models, and pipeline hyperparameters. The only difference was the reranker.

The result: Kanon 2 Reranker decisively outperformed Voyage 2.5 Rerank.

On holdout questions, the head-to-head margin was one of the most extreme we have seen: for every 1 question Voyage got right and we got wrong, there were 6 questions we got right and Voyage got wrong.

We share an example in the blog post where Voyage Rerank actually underperforms Kanon 2 Embedder on its own, delivering the wrong context to the LLM. In that case, not using a reranker at all would have led to the correct answer.

All in all, I’m immensely proud of the performance gains we’ve achieved.
But as we always say, the best benchmark is your own data.

So redeem your free credits, give Kanon 2 Reranker a try, and see firsthand the difference our models can make:
https://huggingface.co/blog/isaacus/kanon-2-reranker

umarbutler

published an article 6 days ago

Article

Kanon 2 Reranker: the most powerful reranker for legal RAG

6 days ago

•

umarbutler

updated a dataset 8 days ago

isaacus/legal-rag-bench

Viewer • Updated 8 days ago • 4.98k • 322 • 12

umarbutler

updated a dataset 9 days ago

isaacus/open-australian-legal-corpus

Viewer • Updated 9 days ago • 147k • 3.24k • 88

umarbutler

posted an update 10 days ago

Post

1922

This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.

abdurrahmanbutler

updated a dataset 10 days ago

isaacus/high-court-of-australia-cases

Viewer • Updated 10 days ago • 8.1k • 61 • 2

umarbutler

updated a dataset 10 days ago

isaacus/high-court-of-australia-cases

Viewer • Updated 10 days ago • 8.1k • 61 • 2

abdurrahmanbutler

posted an update 13 days ago

Post

2574

🚀 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗞𝗮𝗻𝗼𝗻 𝟮 𝗘𝗻𝗿𝗶𝗰𝗵𝗲𝗿: 𝘁𝗵𝗲 𝘄𝗼𝗿𝗹𝗱’𝘀 𝗳𝗶𝗿𝘀𝘁 𝗵𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝗴𝗿𝗮𝗽𝗵𝗶𝘁𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹

Today we’re publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that we’re calling a hierarchical graphitization model.
This is fundamentally different from both universal extraction models and generative models.

As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a 𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗴𝗿𝗮𝗽𝗵 rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasn’t present in the input.

What that enables in practice is unlike any other model or ML architecture on the market:

• 𝗡𝗼 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀 🤖
It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text.

• 𝗛𝗶𝗲𝗿𝗮𝗿𝗰𝗵𝗶𝗰𝗮𝗹 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻, 𝗻𝗼𝘁 𝗷𝘂𝘀𝘁 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻 📑
It deconstructs a document’s full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features.

• 𝗘𝗻𝘁𝗶𝘁𝘆 𝗲𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗼𝗻, 𝗱𝗶𝘀𝗮𝗺𝗯𝗶𝗴𝘂𝗮𝘁𝗶𝗼𝗻, 𝗮𝗻𝗱 𝗹𝗶𝗻𝗸𝗶𝗻𝗴 🔗
It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph.

• 𝗚𝗿𝗮𝗽𝗵-𝗳𝗶𝗿𝘀𝘁 𝗲𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝗰𝘆 🏃‍➡️
Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front

To read more about our new model, check out our latest Hugging Face article:
https://huggingface.co/blog/isaacus/introducing-kanon-2-enricher