Legal Services · RAG + Fine-Tuned Retrieval

18 Years of Case Files.
Every Answer in
Under 4 Minutes.

A 45-attorney Chicago firm had 2.3M documents across iManage and a network file share — 60% scanned PDFs with no text layer. We built a fully on-premise RAG system that cut search time by 95%.

95%

Search time reduction
45 min → 2–4 min

89%

Precision@10 — up from 15% keyword baseline

2.3M

Documents indexed across all systems

45 min

Avg. daily time saved per attorney

Industry

Legal Services

Firm Size

45 attorneys, 80+ staff

Location

Chicago, Illinois

Engagement

5 weeks

The Problem

Three Pain Points.
Eighteen Years of Debt.

🔍

Keyword Search Was Useless

"Indemnification" in a 2.3M-document corpus returns hundreds of results. 45+ minutes per matter just to find relevant precedent.

🧠

Knowledge Lived in Heads

Senior paralegals held informal knowledge maps. When they left, institutional knowledge walked out the door permanently.

🚫

Commercial Tools Blocked

Every commercial legal AI tool required cloud uploads — a clear ethics violation under their professional responsibility obligations.

DOCUMENT CORPUS · 2.3M FILES · ~14TB · 18 YEARS Each category → different extraction strategy

60%

Scanned PDFs

No text layer — images of printed pages from the early 2000s. Deskewing + contrast normalization required.

Tesseract OCR pipeline

25%

Native PDFs + Word

Extractable text but inconsistent formatting — headers bleeding into body, tracked changes, duplicate versions.

pymupdf + python-docx

15%

Mixed Documents

Partially scanned — some pages OCR'd, some not. Required page-level analysis and independent routing.

Page-level classifier

⚠️

The firm evaluated and rejected all commercial legal AI tools

No tool could connect to on-premise iManage without cloud sync, and sending client documents to third-party AI services raised clear ethics concerns. The only viable path: a fully on-premise, custom-built system.

System Architecture

Two Pipelines.
Built for How Lawyers Work.

Attorney Interface

⚖️ Attorney Query Interface

Conversational search · Citation-grounded answers · Ethical wall enforcement · Matter-scoped access

FastAPI + OAuth2 + RBAC

↓ query authenticated against matter-level access matrix

Retrieval Pipeline — 5-Stage Hybrid Search

🔢

Semantic Search

Top 40 via Qdrant embedding similarity

🔤

BM25 Keyword

Top 40 via Elasticsearch term frequency

🔀

RRF Merge

Reciprocal Rank Fusion on both result sets

🎯

Cross-Encoder

Fine-tuned bge-reranker-v2-m3 re-ranks

⚖️

Metadata Boost

Practice area, client, attorney weighting

↓ top results → Claude with citation-enforcement prompt

Ingestion Pipeline — 3 Stages

STAGE 01

Classify & Extract

Native PDF → pymupdf, Word → python-docx, Scanned → Tesseract with deskew. Mixed docs get page-level routing.

STAGE 02

Legal-Aware Chunking

Contracts by clause, briefs by argument, discovery by exhibit — each with full section hierarchy and metadata.

STAGE 03

Embed & Index

Fine-tuned bge-large-en-v1.5 (15K legal pairs). Stored in Qdrant (3-node cluster) with payload filters.

↓ 2.3M documents ingested · incremental iManage sync daily

Data Layer — Fully On-Premise

🗄️ iManage + Network File Share → Unified Index

Qdrant (vector) · Elasticsearch (BM25) · iManage permissions API (ethical walls)

On-premise · No cloud sync

Measured Outcomes

Before vs After.
Validated on 200 Queries.

Metric

⚠ Before

✓ After

Precedent search time

45+ min / matter

Manual keyword search

2–4 min

~95% reduction

Search precision@10

~15%

Hundreds of irrelevant results

~89%

6× improvement

Institutional knowledge

Paralegal-dependent

Lost with staff turnover

Searchable corpus

2.3M docs unified

Citation quality

None

Files, not answers

Grounded answers

Matter no. + page ref

Ethical wall compliance

Manual / process-based

Relied on staff awareness

System-enforced

iManage sync · query filter

45 min

Daily time saved per attorney

2.3M

Documents searchable, unified

5 wks

Ingestion + tuning + deployment

Client docs sent to any cloud

Technical Proof

Citation-Grounded Answers.
Not Just Search Results.

legal-rag · commercial-lit · abstrabit

What indemnification language have we used in SaaS vendor agreements for financial services clients?

▸ hybrid search · 80 candidates · re-ranked · 3 docs retrieved · 1.8s

Two distinct indemnification structures found. In M-2019-0847, p.12 (Northern Trust), mutual indemnification with willful misconduct carve-out, capped at 12 months fees. In M-2021-1203, p.8 (First Midwest Bank), unilateral vendor indemnification for data breaches tied to GLBA — preferred approach post-2021.

A tiered cap structure appears in M-2022-0391, p.15: general liability capped at fees, data liability at $5M, no cap on third-party IP claims.

📄 M-2019-0847 · SaaS Agreement · Northern Trust · 2019-03-14 · p.12

📄 M-2021-1203 · Vendor Contract · First Midwest Bank · 2021-07-22 · p.8

📄 M-2022-0391 · Tech Services Agreement · Heartland Financial · 2022-11-09 · p.15

⚠ 2 additional documents matched with score < 0.60 — flagged for manual review.

📊 Retrieval Performance (200 Query Eval Set)

Keyword only

15%

Semantic alone

64%

Hybrid (semantic+BM25)

72%

+ Fine-tuned re-ranker

89%

🔒 Ethical Wall Enforcement

✓ Access controls sync from iManage — see only what iManage grants

✓ Restricted docs filtered at vector store query level — never returned

✓ Every query logged: user, timestamp, matter context, docs retrieved

✓ Audit log exportable for ethics review — full traceability

✓ Zero client documents sent to any external API

Full Stack

bge-large-en-v1.5 bge-reranker-v2-m3 Qdrant Elasticsearch Tesseract OCR pymupdf python-docx FastAPI Claude API iManage API Docker Nginx

Why Generic RAG Fails in Legal

"Consideration" means completely different things in contract law vs. administrative law. Generic semantic search conflates them. Our domain-specific fine-tuning on 15K legal query-document pairs solves this.

18 Years of Case Files.Every Answer inUnder 4 Minutes.

Three Pain Points.Eighteen Years of Debt.