Skip to content

Slightly relax jurisdiction requirement#468

Open
ppinchuk wants to merge 13 commits into
mainfrom
pp/relax_jur_search
Open

Slightly relax jurisdiction requirement#468
ppinchuk wants to merge 13 commits into
mainfrom
pp/relax_jur_search

Conversation

@ppinchuk
Copy link
Copy Markdown
Collaborator

@ppinchuk ppinchuk commented Jun 5, 2026

Previously we were checking the county for every subdivision if a county name was given. This is the most rigorous approach, but it is too strict in practice because towns and cities often omit county names in documents or website urls. Technically, requiring the county name is too strict of a requirement if a subdivision name is unique within a state. Therefore, we relax the check if we can show that the subdivision name is unique within a state. If it is not, the county name is still required. Otherwise, we simply check for the subdivision + state name combo

@ppinchuk ppinchuk self-assigned this Jun 5, 2026
Copilot AI review requested due to automatic review settings June 5, 2026 17:12
@ppinchuk ppinchuk requested a review from castelao as a code owner June 5, 2026 17:12
@ppinchuk ppinchuk added enhancement Update to logic or general code improvements topic-python-llm Issues/pull requests related to LLMs p-high Priority: high labels Jun 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR relaxes jurisdiction validation by only requiring county mentions when a subdivision name is not uniquely identifiable within a state, and it reduces LLM usage for URL validation by short-circuiting when a URL’s domain matches a jurisdiction’s canonical website.

Changes:

  • Add a fast-path in URL jurisdiction validation to skip LLM checks when the URL domain matches the known/canonical jurisdiction website.
  • Introduce subdivision-name lookup + normalization utilities to determine whether a subdivision name is unique within a state.
  • Update validation decision-tree graphs and expand unit tests for URL-domain short-circuiting and new jurisdiction naming/lookup behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/python/unit/validation/test_validation_location.py Adds async tests ensuring URL-domain match skips LLM and mismatches still invoke the decision tree.
tests/python/unit/utilities/test_utilities_jurisdictions.py Adds tests for normalized subdivision-name matching and new short-name-with-state properties.
compass/validation/location.py Adds canonical-domain matching helpers to bypass URL LLM validation when safe.
compass/validation/graphs.py Updates jurisdiction decision-tree logic to relax county requirements based on subdivision uniqueness.
compass/utilities/jurisdictions.py Adds normalized subdivision lookup helper, short-name properties, and caches jurisdiction metadata loading.
compass/scripts/download.py Adjusts logging output formatting for remaining documents after jurisdiction filtering.

Comment thread compass/utilities/jurisdictions.py Outdated
Comment thread compass/utilities/jurisdictions.py Outdated
Comment thread compass/validation/graphs.py Outdated
Comment thread compass/validation/graphs.py
@ppinchuk ppinchuk linked an issue Jun 5, 2026 that may be closed by this pull request
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.27273% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.55%. Comparing base (75cc04c) to head (aef2c9c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
compass/validation/location.py 82.60% 2 Missing and 2 partials ⚠️
compass/validation/graphs.py 80.00% 1 Missing and 1 partial ⚠️
compass/scripts/download.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #468      +/-   ##
==========================================
+ Coverage   60.86%   61.55%   +0.68%     
==========================================
  Files          77       77              
  Lines        6843     6903      +60     
  Branches      670      683      +13     
==========================================
+ Hits         4165     4249      +84     
+ Misses       2561     2533      -28     
- Partials      117      121       +4     
Flag Coverage Δ
unittests 61.55% <87.27%> (+0.68%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Update to logic or general code improvements p-high Priority: high topic-python-llm Issues/pull requests related to LLMs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants