Skip to content

roberteisenberg/langgraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LangGraph Fraud Investigation Tutorial

This tutorial walks you through detecting fraud starting from a simple function-to-LLM call, progressively adding LangGraph scaffolding, tools, investigation loops, infrastructure, human-in-the-loop workflows, and multi-agent coordination — all using the same 6 fraud cases so you can see exactly what each addition contributes. By the end, you'll be able to determine whether LangGraph — and which parts of LangGraph — are necessary for your own scenarios.

This is what you'll build:

Multi-Agent Architecture

A supervisor agent coordinates three specialist agents. Each investigates a different domain — customer history, shipping addresses, payment patterns. Evidence accumulates in shared state. A deterministic scoring function produces auditable risk scores. Ambiguous cases pause for human review.

Why Fraud Investigation

Fraud investigation demands high auditability — every decision needs a trail. This plays to LangGraph's strengths: checkpointed state, human-in-the-loop interrupts, structured evidence accumulation, and inspectable graph execution. This is a LangGraph tutorial wearing a fraud costume — not the other way around. Every feature teaches a LangGraph concept. The fraud logic is believable but minimal.

The 6 Test Cases

The same 6 cases run through every phase. The cases are the constant; the architecture is the variable.

Case Scenario Challenge
1: Obviously Legit 5-year customer, 50 orders, $45 purchase Baseline — should approve quickly
2: Mildly Suspicious New customer, no history, $380 order New but not fraudulent — shouldn't overreact
3: High Risk Disposable email, freight forwarder, $2,800 Multiple red flags — should reject
4: Conflicting Signals Loyal customer, warehouse address, IP change The hard one — conflicting signals need investigation
5: Historical Fraud Prior fraud flag (resolved), normal $60 order Should recognize the flag was a false positive
6: Tool Error Normal order, payment service unavailable Should handle gracefully, not crash

Case 4 is the real test. A loyal 2-year customer with 30 successful orders, but the shipping address is a warehouse shared by 3 accounts, there's a recent IP change, and a slight name mismatch. Phase 1 scores it 5/100 (approve) because the LLM can't see the warehouse. Phase 3 scores it 61/100 (review) because tools surface the conflicting signals and deterministic scoring balances them.

The Phases

Each phase builds on the previous one, adding one or two concepts at a time.

A plain Python function calls the LLM. No graph, no tools. Establishes what a single API call can and can't do. Case 4 fails badly — the LLM guesses from surface data.

Same logic, now in a LangGraph state machine. Teaches StateGraph, nodes, edges, and conditional edges. The graph doesn't improve results — it sets up the scaffolding Phase 2 needs.

Three investigation tools and the first cycle in the graph. The LLM can now call tools, observe results, and decide whether to investigate further or stop. This is the foundational shift from deterministic to dynamic execution — the LLM controls the flow.

The hinge of the tutorial. The LLM investigates, Python scores, LangGraph manages the loop. Structured evidence accumulates through the investigation. A deterministic scoring formula produces auditable risk scores. Every subsequent phase must match these scores — they're the baseline.

Score accuracy across the first three phases: Phase 1 gets 3/6 in range. Phase 2 adds tools and gets... 3/6. Different misses, same hit rate. More data didn't help because the LLM was still interpreting and scoring in prose. Phase 3 hits 6/6 by narrowing what the LLM is responsible for — not by giving it more to work with.

Same graph, new capabilities inside it. Graph visualization, streaming, recursion limits, error handling, dead-end prevention, and token budget circuit breakers. The infrastructure you need once the LLM controls the flow.

Checkpointing enables pause, resume, and fork. Ambiguous cases (score 50-79) route to human review. The graph pauses, persists state, and waits. Hours later, an analyst resumes with a decision. State forking enables "what-if" analysis.

The crown jewel. A Sonnet supervisor routes to three Haiku specialists. Even at 5 tools, decomposing complex reasoning from simple execution cuts costs by 67% — same decisions, same scores. At 200+ tools, you also get better tool selection. The duplicate evidence bug — the most instructive failure in the tutorial — emerges from how agents compose together.

This phase covers making a LangGraph app ready for production: schema hardening (TypedDict → Pydantic), context engineering (managing growing conversations), and an evaluation harness for non-deterministic systems. No new graph topology — this phase hardens everything built in Phases 1-6.

The Story Arc

Phases Message
0-1 You don't need LangGraph yet
2-3 Now you do — the loop appears, the investigator emerges
4 Here's what you get — infrastructure for LLM-driven loops
5 Real workflow patterns — pause, resume, fork
6 Scale and specialize — multi-agent with cost optimization
7 Ship it — the other 70%

What Is LangGraph

LangGraph is a workflow system for LLMs — it coordinates their actions and provides plumbing to them, much like traditional workflow engines provide plumbing to applications.

LLMs bring reasoning to applications the same way they bring it to people — the application sends a prompt and receives an answer. But LLMs don't track state and don't remember what they did two calls ago. They can't execute code and can't enforce their own limits. LangGraph is the loop that keeps calling them back: accumulating state between calls, re-feeding context each time, executing the tools the LLM asks for, enforcing guardrails (recursion limits, token budgets), and handling pause/resume for human-in-the-loop.

LangGraph makes non-deterministic, meaning-based branching manageable — visualization, state accumulation, checkpointing, guardrails for decisions you can't predict at design time, and multi-agent coordination where specialized agents collaborate on complex tasks. What it does NOT optimize: token management, LLM call caching, or cost per call. The savings is developer time and plumbing, not runtime efficiency.

When to Use LangGraph

You need it when:

  • The LLM drives a loop — reason, act, observe, repeat
  • You need to pause for human review and resume hours later
  • You're coordinating multiple agents with different roles
  • Workflows branch based on what the LLM discovers at runtime

You don't when:

  • Single LLM call with a prompt
  • Linear pipeline (prompt → LLM → parse → return)
  • Basic RAG without agentic behavior

LangGraph is the right tool when you need durable, interruptible, multi-step agent workflows. For everything else, it is overhead.

How to Run

Each phase runs independently:

python3 phases/phase0-baseline/graph.py
python3 phases/phase1-first-graph/graph.py
python3 phases/phase2-tools/graph.py
python3 phases/phase3-investigator/graph.py
python3 phases/phase4-infrastructure/graph.py
python3 phases/phase5-workflows/graph.py
python3 phases/phase6-multi-agent/graph.py
python3 phases/phase7-ship-it/graph.py

Requires an Anthropic API key in .env:

ANTHROPIC_API_KEY=your-key-here

Related

Related repos:

  • Dapr Microservices Demo — Dapr microservices from local development to cloud-ready Kubernetes, including AI-powered fraud detection via Dapr Workflow

Currently planned repos:

  • LangGraph agent patterns (reflection, parallel execution)
  • LangSmith observability and evaluation workflows
  • Dapr Workflow + LangGraph integration

About

Fraud investigation tutorial across 9 phases — same 6 cases, progressively adding LangGraph, tools, HITL, multi-agent coordination, and LangSmith observability

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages