Skip to content

Add notebook with EIA-FERC utility match#5188

Draft
katie-lamb wants to merge 41 commits into
mainfrom
devtools-eia-ferc-utility-match
Draft

Add notebook with EIA-FERC utility match#5188
katie-lamb wants to merge 41 commits into
mainfrom
devtools-eia-ferc-utility-match

Conversation

@katie-lamb

@katie-lamb katie-lamb commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Overview

Create a devtools notebook with the splink model to match FERC and EIA utilities.

What problem does this address?

What did you change?

Documentation

Make sure to update relevant aspects of the documentation:

  • Update the release notes: reference the PR and related issues.
  • Update relevant Data Source jinja templates (see docs/data_sources/templates).
  • Update relevant table or source description metadata (see src/metadata).
  • Review and update any other aspects of the documentation that might be affected by this PR.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

  • If updating analyses or data processing functions: make sure to update row count expectations in dbt tests.
  • Run pixi run prek-run to run linters and static code analysis checks.
  • Run pixi run pytest-ci locally to ensure that the merge queue will accept your PR.
  • Review the PR yourself and call out any questions or issues you have.
  • For PRs that change the PUDL outputs significantly, run the full ETL locally and then run the data validations using dbt. If you can't run the ETL locally then run the build-deploy-pudl GitHub Action manually and ensure that it succeeds.

e-belfer and others added 30 commits February 11, 2026 11:58
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Base automatically changed from add-eia-ferc-utility-address-match to main April 16, 2026 19:57
@e-belfer

Copy link
Copy Markdown
Member

In reviewing matches for training, I hand-verified all records but flagged one that is currently matched in PUDL but appears to be two different entities:

FERC ID 454 (oleander power project, limited partnership) and EIA utility ID 22313 (Oleander Power Project LP), which are in different states and time periods. Slightly out of scope, but we should consider removing this match in this PR as we're adding new ones.

@katie-lamb

katie-lamb commented Apr 22, 2026

Copy link
Copy Markdown
Contributor Author

Some results from the latest notebook model changes:

  • Results look better after a second iteration (taking out city as a comparison level).
  • I used a match probability threshold of .9 and matched 67% of FERC utilities. A match probability threshold of .8 matches .7 utilities, so it seems like there are diminishing returns under the .9 match probability. we should still look at if it's better to have a higher match probability, .99 or .95 for example to get high accuracy and lower recall.
  • With a match threshold of .9 and choosing the best match per FERC utility we match 60 pairs from the labeled set and miss 19 pairs (they aren't labeled as not a match, but their match probability is below .9). the charts show that precision is always high (1, or 100% accurate) but it's tricky to get recall up because some of the matches in the labeled set are unexpected/weird.

@jdangerx jdangerx moved this from New to In progress in Catalyst Megaproject Apr 22, 2026
@cmgosnell cmgosnell added eia923 Anything having to do with EIA Form 923 ferc1 Anything having to do with FERC Form 1 eia860 Anything having to do with EIA Form 860 glue PUDL specific structures & metadata. Stuff that connects datasets together. labels Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eia860 Anything having to do with EIA Form 860 eia923 Anything having to do with EIA Form 923 ferc1 Anything having to do with FERC Form 1 glue PUDL specific structures & metadata. Stuff that connects datasets together.

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

5 participants