Slightly relax jurisdiction requirement#468
Conversation
There was a problem hiding this comment.
Pull request overview
This PR relaxes jurisdiction validation by only requiring county mentions when a subdivision name is not uniquely identifiable within a state, and it reduces LLM usage for URL validation by short-circuiting when a URL’s domain matches a jurisdiction’s canonical website.
Changes:
- Add a fast-path in URL jurisdiction validation to skip LLM checks when the URL domain matches the known/canonical jurisdiction website.
- Introduce subdivision-name lookup + normalization utilities to determine whether a subdivision name is unique within a state.
- Update validation decision-tree graphs and expand unit tests for URL-domain short-circuiting and new jurisdiction naming/lookup behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/python/unit/validation/test_validation_location.py |
Adds async tests ensuring URL-domain match skips LLM and mismatches still invoke the decision tree. |
tests/python/unit/utilities/test_utilities_jurisdictions.py |
Adds tests for normalized subdivision-name matching and new short-name-with-state properties. |
compass/validation/location.py |
Adds canonical-domain matching helpers to bypass URL LLM validation when safe. |
compass/validation/graphs.py |
Updates jurisdiction decision-tree logic to relax county requirements based on subdivision uniqueness. |
compass/utilities/jurisdictions.py |
Adds normalized subdivision lookup helper, short-name properties, and caches jurisdiction metadata loading. |
compass/scripts/download.py |
Adjusts logging output formatting for remaining documents after jurisdiction filtering. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #468 +/- ##
==========================================
+ Coverage 60.86% 61.55% +0.68%
==========================================
Files 77 77
Lines 6843 6903 +60
Branches 670 683 +13
==========================================
+ Hits 4165 4249 +84
+ Misses 2561 2533 -28
- Partials 117 121 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Previously we were checking the county for every subdivision if a county name was given. This is the most rigorous approach, but it is too strict in practice because towns and cities often omit county names in documents or website urls. Technically, requiring the county name is too strict of a requirement if a subdivision name is unique within a state. Therefore, we relax the check if we can show that the subdivision name is unique within a state. If it is not, the county name is still required. Otherwise, we simply check for the subdivision + state name combo