AI & ML interests

We build legal AI models for legal tech firms that help them ship smarter products, faster.

Recent Activity

umarbutlerย  updated a collection 2 days ago
Open Legal Data
umarbutlerย  published a dataset 2 days ago
isaacus/legal-rag-qa
umarbutlerย  updated a dataset 2 days ago
isaacus/legal-rag-qa
View all activity

Articles

abdurrahmanbutlerย 
posted an update 6 days ago
view post
Post
175
Isaacus just shipped a new state-of-the-art model, this time focused on reranking for legal RAG.

Although Kanon 2 Embedder already represents the frontier of legal-domain retrieval, we know that not everyone is ready to re-embed their entire corpus. We also knew there was still accuracy left on the table for teams handling highly sensitive legal work.

Enter Kanon 2 Reranker: the worldโ€™s best legal reranking model.

We tested it across both production RAG pipelines and standalone retrieval tasks, and the results were remarkable.

Not only does it outperform the competition in a category where there are still very few serious alternatives, it also delivers major retrieval accuracy gains over our standalone embedder. Those improvements translated into exceptional downstream performance.

In our final test, we compared Voyage AI by MongoDB 2.5 Rerank with Kanon 2 Reranker on Legal RAG Bench, using identical embedding models, generative models, and pipeline hyperparameters. The only difference was the reranker.

The result: Kanon 2 Reranker decisively outperformed Voyage 2.5 Rerank.

On holdout questions, the head-to-head margin was one of the most extreme we have seen: for every 1 question Voyage got right and we got wrong, there were 6 questions we got right and Voyage got wrong.



We share an example in the blog post where Voyage Rerank actually underperforms Kanon 2 Embedder on its own, delivering the wrong context to the LLM. In that case, not using a reranker at all would have led to the correct answer.

All in all, Iโ€™m immensely proud of the performance gains weโ€™ve achieved.
But as we always say, the best benchmark is your own data.

So redeem your free credits, give Kanon 2 Reranker a try, and see firsthand the difference our models can make:
https://huggingface.co/blog/isaacus/kanon-2-reranker
umarbutlerย 
published an article 6 days ago
view article
Article

Kanon 2 Reranker: the most powerful reranker for legal RAG

โ€ข
6
umarbutlerย 
posted an update 10 days ago
view post
Post
1922
This awesome visualization by @abdurrahmanbutler tracks how reliant the High Court of Australia has been on UK precedents over time.

Back in the early 1900s, up to 70% of citations in High Court decisions were from the UK. Today, that number sits around 20%.

This change seems to have happened gradually as Australia gained more and more independence from the UK, culminating in the Australia Acts of 1986, where we see a nice bump in the proportion of Australian cases cited.

These insights would not be possible without our latest legal AI model, Kanon 2 Enricher, which we used to extract dates and citations from High Court decisions in isaacus/open-australian-legal-corpus and categorize citations by jurisdiction. You can learn about Kanon 2 Enricher here: https://isaacus.com/blog/kanon-2-enricher.
abdurrahmanbutlerย 
posted an update 13 days ago
view post
Post
2574
๐Ÿš€ ๐—œ๐—ป๐˜๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐—ถ๐—ป๐—ด ๐—ž๐—ฎ๐—ป๐—ผ๐—ป ๐Ÿฎ ๐—˜๐—ป๐—ฟ๐—ถ๐—ฐ๐—ต๐—ฒ๐—ฟ: ๐˜๐—ต๐—ฒ ๐˜„๐—ผ๐—ฟ๐—น๐—ฑโ€™๐˜€ ๐—ณ๐—ถ๐—ฟ๐˜€๐˜ ๐—ต๐—ถ๐—ฒ๐—ฟ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐—ฐ๐—ฎ๐—น ๐—ด๐—ฟ๐—ฎ๐—ฝ๐—ต๐—ถ๐˜๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น

Today weโ€™re publicly releasing Kanon 2 Enricher, and with it, an entirely new class of AI model that weโ€™re calling a hierarchical graphitization model.
This is fundamentally different from both universal extraction models and generative models.

As a hierarchical graphitization model, Kanon 2 Enricher natively outputs a ๐—ธ๐—ป๐—ผ๐˜„๐—น๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—ด๐—ฟ๐—ฎ๐—ฝ๐—ต rather than tokens, which makes it architecturally incapable of hallucinating or inventing text that wasnโ€™t present in the input.

What that enables in practice is unlike any other model or ML architecture on the market:

โ€ข ๐—ก๐—ผ ๐—ต๐—ฎ๐—น๐—น๐˜‚๐—ฐ๐—ถ๐—ป๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐Ÿค–
It cannot hallucinate. All references and links are stored as spans, meaning exact character offsets anchored to the original text.

โ€ข ๐—›๐—ถ๐—ฒ๐—ฟ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐—ฐ๐—ฎ๐—น ๐˜€๐—ฒ๐—ด๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐˜๐—ถ๐—ผ๐—ป, ๐—ป๐—ผ๐˜ ๐—ท๐˜‚๐˜€๐˜ ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐Ÿ“‘
It deconstructs a documentโ€™s full nested hierarchy, down to chapters, sections, clauses, schedules, signatures, and even singular sentences, and classifies each span with dozens of contextual features.

โ€ข ๐—˜๐—ป๐˜๐—ถ๐˜๐˜† ๐—ฒ๐˜…๐˜๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป, ๐—ฑ๐—ถ๐˜€๐—ฎ๐—บ๐—ฏ๐—ถ๐—ด๐˜‚๐—ฎ๐˜๐—ถ๐—ผ๐—ป, ๐—ฎ๐—ป๐—ฑ ๐—น๐—ถ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐Ÿ”—
It resolves what references actually point to, then links entities, citations, and cross-references into a single coherent graph.

โ€ข ๐—š๐—ฟ๐—ฎ๐—ฝ๐—ต-๐—ณ๐—ถ๐—ฟ๐˜€๐˜ ๐—ฒ๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐˜† ๐Ÿƒโ€โžก๏ธ
Small enough to run locally on a consumer PC with sub-second latency, and it stays reliable on long documents where front

To read more about our new model, check out our latest Hugging Face article:
https://huggingface.co/blog/isaacus/introducing-kanon-2-enricher
abdurrahmanbutlerย 
published an article 13 days ago
view article
Article

Introducing Kanon 2 Enricher โ€” the worldโ€™s first hierarchical graphitization model

โ€ข
6