ETL pipeline for US Treasury CDFI Fund public datasets.
Download, clean, and analyze Transaction Level Report (TLR), Consumer Loan Report (CLR), and Awards data from the US Department of Treasury's CDFI Fund — in one line of Python.
The CDFI Fund releases massive public datasets covering millions of loans and investments in low-income communities. But the raw files are messy, inconsistently formatted, and require significant cleaning before analysis. cdfi-data standardizes the entire pipeline.
pip install cdfidata
from cdfidata import TLRLoader, CLRLoader, AwardsLoader
# Load a single TLR fiscal year (downloads & caches automatically)
tlr = TLRLoader()
df = tlr.load(year=2022)
# Load the full cumulative TLR (FY2020–FY2022), stacked with provenance
cum = tlr.load_cumulative()
# ...or an explicit range:
cum = tlr.load_range(2020, 2022)
# Filter to Illinois
il = tlr.filter_state("IL")
# Filter by loan type and amount
small_biz = tlr.filter_loan_type("Business")
large = tlr.filter_amount(min_amount=500_000)
# Summary stats
tlr.summary()
# Export
tlr.to_csv("cdfi_transactions.csv")
tlr.to_sqlite("cdfi.db", table="tlr")
Caveat — cumulative frames stack overlapping releases. load_cumulative() /
load_range() concatenate releases with no dedup: each row carries a source_release
column (FY2020/FY2021/FY2022), and releases overlap on fiscal_year (FY2022 restates
and expands prior-year data). Filter by source_release and prefer the latest release for a
given fiscal year — don't naively aggregate the full frame, or restated rows double-count.
Field completeness (rate/term/NAICS) is also era-dependent. See docs/CANONICAL_SCHEMA.md.
from cdfidata import TLRLoader, CLRLoader, AwardsLoader
tlr = TLRLoader()
df = tlr.load_sample(n=1000)
clr = CLRLoader()
df = clr.load_sample(n=1000)
awards = AwardsLoader()
df = awards.load_sample(n=500)
| Dataset | Source | Description |
|---|---|---|
| TLR (Transaction Level Report) | CDFI Fund | 1M+ individual CDFI loans, 61 variables |
| CLR (Consumer Loan Report) | CDFI Fund | 3.2M consumer loans aggregated to census tract |
| Awards Database | CDFI Fund | All CDFI Fund program awardees across all years |
CDFI Fund datasets (TLR, CLR, Awards) come from the US Department of Treasury CDFI Fund: https://www.cdfifund.gov/research-data
All data is released under open government data principles.
PYTHONPATH=. pytest tests/ -v
44 tests across all modules.
- Impact investors analyzing CDFI loan portfolios
- Academic researchers studying community development finance
- Policy analysts evaluating CDFI Fund program outcomes
- CDFIs benchmarking their own performance against peers
- Anyone who needs clean, analysis-ready CDFI Fund data
MIT 2026 Jaypatel1511