LexShield AI 🏛️

Agentic Indian Legal Intelligence Platform

The Problem

India has 50+ million pending court cases. Most citizens cannot afford a lawyer. Legal documents are written in language ordinary people do not understand. When someone's landlord illegally withholds their deposit, their employer steals their wages, or they face a false case — they have nowhere to turn.

Existing tools fail in every way that matters:

Search engines return PDFs no one can read
Generic AI chatbots hallucinate section numbers and punishments
No tool understands scanned Indian legal documents
No tool is aware of jurisdiction-specific state laws
No multilingual support for regional languages

Overview

LexShield AI is an end-to-end legal empowerment platform serving common citizens who face a regional language barrier and the inaccessibility of legal help amidst India's 50 million pending court cases. It solves the critical problem of legal illiteracy by providing immediate, reliable, and accessible legal guidance.

The system can transform a scanned document photograph into a cited legal explanation in Malayalam—or other supported languages—in under 60 seconds. The platform uses a specialized multi-agent workflow (LangGraph), advanced RAG with hybrid search and CRAG self-correction, and document intelligence (OCR, NLP, NER) to demystify Indian laws.

LexShield AI is specifically optimized to run entirely on consumer-grade hardware (CPU-only inference) utilizing free-tier APIs, ensuring zero-cost operations for public empowerment.

Features

Core Capabilities

Advanced RAG Pipeline: Multi-hop decomposition, CRAG self-correction, era-aware synthesizers (IPC vs BNS), and knowledge graph enrichment.
Document Intelligence: Deep analysis of uploaded documents via Tesseract OCR, PyMuPDF, and custom NER/classifiers to assess risk and detect rights violations.
Stateful Drafting Agent: Human-in-the-loop multi-turn guided flow that drafts legal complaints across 8 distinct categories.
Case Law Search: Live integration with Indian Kanoon to retrieve real precedent and summaries.
Rights Module: Static and dynamic lookups to educate users on Tenant, Employee, Consumer, Women, and Bail rights.

Multilingual Support

5 Supported Languages: English, Malayalam, Hindi, Tamil, and Telugu.
Seamless Translation: Translates queries into English for retrieval and translates responses back to the native tongue while strictly preserving legal entity names in English.

Architecture

LexShield uses a Central Orchestrator and Specialized Agents pattern. The Master Orchestrator intercepts requests and utilizes a LangGraph StateGraph to conditionally route execution to the appropriate specialized node based on intent.

graph TD
    User([User Request]) --> API[FastAPI Backend]
    API --> MasterOrchestrator[Master Orchestrator]
    MasterOrchestrator --> IntentClassifier{Intent Classifier}
    
    IntentClassifier -->|legal_query| RAGNode[Advanced RAG Agent]
    IntentClassifier -->|document_analysis| DocNode[Document Intelligence Agent]
    IntentClassifier -->|draft_request| DraftNode[Drafting Agent]
    IntentClassifier -->|translation_request| MultiNode[Multilingual Agent]
    IntentClassifier -->|case_law_search| CaseNode[Case Law Agent]
    IntentClassifier -->|rights_check| RightsNode[Rights Agent]
    
    RAGNode --> VectorStore[(ChromaDB + BM25)]
    RAGNode --> KG[(Knowledge Graph)]
    RAGNode --> GroqLLM[Groq LLaMA 3.3 70B]
    
    DocNode --> OCR[Tesseract / PyMuPDF]
    DocNode --> NER[InLegalNER / InLegalBERT]
    
    DraftNode --> SQLite[(Session Memory)]
    DraftNode --> GroqLLM
    
    CaseNode --> IndianKanoon[Indian Kanoon API]

Tech Stack

Layer	Technology	Purpose
Backend	FastAPI + Uvicorn	High-performance async REST API server
Agent Framework	LangGraph	Stateful multi-agent workflow orchestration
LLM Primary	Groq LLaMA 3.3 70B	Fast inference for general reasoning and RAG
LLM Fallback	Gemini 2.0 Flash	Redundant LLM for fault tolerance
Embeddings	sentence-transformers	`all-MiniLM-L6-v2` for CPU-optimized semantic queries
Legal NLP	InLegalBERT / InLegalNER	Legal embeddings, doc classification, and NER
Vector Database	ChromaDB / BM25	Hybrid search (sparse and dense retrieval)
Reranker	NVIDIA NIM	Precision ranking of retrieved legal context
OCR & Vision	Tesseract / OpenCV / PyMuPDF	Extraction from scanned documents and PDFs
Session Memory	SQLite	Persistent multi-turn chat and graph state checkpointer
Observability	LangSmith	Execution tracing, latency, and token monitoring
Frontend	React 18 + Vite	Fast, responsive single-page user interface

Prerequisites

Python: ≥ 3.10
Node.js: ≥ 18.x
Tesseract OCR: Installed on your system with English, Malayalam (mal), and Hindi (hin) language packs.
Poppler: Required by pdf2image.

Installation

# 1. Clone the repository
git clone https://github.com/anantha037/lexshield-ai.git
cd lexshield-ai

# 2. Setup Python Backend Environment
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

# 3. Install Python Dependencies
pip install --no-cache-dir -r requirements.txt

# 4. Setup Frontend Environment
cd frontend
npm install
cd ..

Configuration

LexShield AI relies heavily on external APIs. Create a .env file in the root directory by copying the example:

cp .env.example .env

Environment Variables

Variable	Required	Default	Description
`GROQ_API_KEY`	Yes	-	Primary LLM (LLaMA 3.3 70B via Groq)
`GEMINI_API_KEY`	Yes	-	Primary for DraftingAgent, Fallback for RAG
`NVIDIA_API_KEY`	No	-	Optional, for Reranking via NIM
`INDIANKANOON_API_KEY`	No	-	Required for real-time Case Law searches
`ENABLE_CASE_LAW_ENRICHMENT`	No	`true`	Set `false` to skip Case Law calls entirely
`LANGCHAIN_TRACING_V2`	No	`false`	Enable to `true` for LangSmith tracing
`LANGCHAIN_API_KEY`	No	-	LangSmith trace key
`LANGCHAIN_PROJECT`	No	`lexshield-ai`	LangSmith project name
`JWT_SECRET_KEY`	Yes	-	Secure 32-char string for Auth tokens
`ALLOWED_ORIGINS`	No	`http://localhost:3000,...`	CORS origins

Usage

Running Locally

To run the application locally, you'll need two terminal windows:

Terminal 1: Start Backend (FastAPI)

# Ensure virtual environment is active
uvicorn api.main:app --reload --port 8000

Terminal 2: Start Frontend (React/Vite)

cd frontend
npm run dev

Navigate your browser to http://localhost:5173.

API Reference

The backend exposes over 20 endpoints for various capabilities. Swagger documentation is available natively at http://localhost:8000/docs when the server is running.

Key Endpoints

GET /health — Check system health, DB counts, LLM pings, and tracing status.
POST /auth/login — Generate JWT tokens.
POST /orchestrator/chat — Main agentic entry point. Expects JSON { "query": "..." }.
POST /document/analyze — Upload PDF/Images for OCR and NLP analysis (multipart/form-data).
POST /legal/query — Direct access to the RAG pipeline.

Authentication via Bearer token is handled across protected routes.

Project Structure

lexshield-ai/
├── agents/           # LangGraph workflows, nodes, intent routers, and agents
├── api/              # FastAPI entry points, CORS, routers (auth, document, etc.)
├── cv/               # Computer Vision pipelines (OCR, PDF layout analysis)
├── data/             # Vector stores, SQLite memory DBs, Graph JSONs, raw text
├── evals/            # Custom RAGAS pipeline and evaluation tools
├── frontend/         # React SPA (Vite)
├── logs/             # System and execution logs
├── models/           # Custom NLP classifiers and Risk Scorers
├── nlp/              # NER pipelines and entity extractors
├── rag/              # CRAG, Embedders, Vector DB handlers, Synthesizers
├── tests/            # Pytest suite and automated evaluations
├── docker-compose.yml# Container orchestration config
├── Dockerfile        # Python application build script
└── requirements.txt  # Python pip dependencies

Running Tests

LexShield AI employs a custom test suite checking the vector store, DB memory, graph execution, and agent relevance logic.

# Run all tests using pytest
pytest

# Run a specific test file
pytest tests/test_relevance.py

# Run RAG evaluation framework specifically
python tests/run_evals.py

Deployment

LexShield AI is containerized for easy deployment.

# Build and run using Docker Compose
docker-compose up --build -d

Docker Services

api: Builds the FastAPI application mapped to port 8000.
chromadb: Maps a persistent ChromaDB instance mapped to port 8001, volume-mounted to ./data/chroma.

Contributing

Fork the repository
Clone your fork: git clone https://github.com/your-username/lexshield-ai.git
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Security

Authentication: Endpoints are secured via JSON Web Tokens (JWT) using bcrypt for password hashing. JWT secrets must be rotated in production.
Reporting Vulnerabilities: Please open a confidential GitHub issue if you spot a vulnerability or email the maintainer directly.
Caveat: The system heavily depends on LLM generations which can occasionally hallucinate. Generated outputs should be treated as guidance and not strict legal counsel.

Changelog / Roadmap

Current Version (1.0.0)

Implemented MasterOrchestrator and IntentClassifier
Advanced RAG pipeline with CRAG, NVIDIA Reranking, and Era-Aware synthesis (IPC -> BNS)
Stateful drafting agent for 8 local complaint types
Local execution optimizations (CPU-only sentence-transformers)

Roadmap

Integrate Whisper for Voice input/output.
Implement GPU-powered LayoutLM for advanced document layout parsing.
Transition from local SQLite session memory to Redis for scale.
Expand knowledge graph mapped relationships.

License

No specific license file was found in the repository. Please contact the repository owner for permissions regarding commercial use or redistribution.

Acknowledgements / Credits

Designed and built by Anantha Krishnan K, CS Graduate, Hansraj College, University of Delhi.
law-ai: For providing the incredible InLegalBERT and InLegalNER models.
Indian Kanoon: For providing the open legal API for case law precedent.
LangChain & LangGraph: For providing the core agentic framework and tracing (LangSmith).
Groq: For fast, free-tier LLaMA inference.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
agents		agents
api		api
cv		cv
data		data
evals		evals
frontend		frontend
models		models
nlp		nlp
rag		rag
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LexShield AI 🏛️

Agentic Indian Legal Intelligence Platform

The Problem

Table of Contents

Overview

Features

Core Capabilities

Multilingual Support

Architecture

Tech Stack

Prerequisites

Installation

Configuration

Environment Variables

Usage

Running Locally

API Reference

Key Endpoints

Project Structure

Running Tests

Deployment

Docker Services

Contributing

Security

Changelog / Roadmap

Current Version (1.0.0)

Roadmap

License

Acknowledgements / Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages