Skip to content

refactor(catchall): rewrite version normalization in catchall to hand…#3133

Open
YishaiGlasner wants to merge 3 commits intomasterfrom
feature/sc-42368/live-bug-on-pre-v3-api-urls
Open

refactor(catchall): rewrite version normalization in catchall to hand…#3133
YishaiGlasner wants to merge 3 commits intomasterfrom
feature/sc-42368/live-bug-on-pre-v3-api-urls

Conversation

@YishaiGlasner
Copy link
Contributor

Description

Rewrite version normalization in catchall to handle non-existent versions and support multi-panel URLs.
Replace _reader_redirect_add_languages with a more robust version normalization flow that supports per-panel version params (ven, ven1, etc.). The redirect now fires when current params differ from normalized ones, rather than whenever a '|' separator is missing. Also, when the required version doesn't exist it redirects to default (conserving the redirect for legacy urls without version language).

Functions description

_get_current_and_normalized_versions - returns current_versions a dict of all version params (ven, vhe, ven1...) in the request, and normalized_versions all the normalized params.
_get_normalized_versions - returns the normalized versions for specific panel (with tref and optional ven and vhe).
_reader_redirect_versions - returns a redirect with the normalized params.

…le non-existent versions and support multi-panel URLs

Replace _reader_redirect_add_languages with a more robust version
normalization flow that
supports per-panel version params (ven, ven1, etc.). The redirect
now fires when current params differ from normalized ones,
rather than whenever a '|' separator is missing. Also, when the required version doesn't exist it redirects to default (conserving the redirect for legacy urls without version language).
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the version normalization logic in the catchall view to handle non-existent versions (by redirecting to a version-less URL) and to support multi-panel URLs with per-panel version parameters (ven1, vhe1, etc.). The old _reader_redirect_add_languages function, which only triggered redirects when a | separator was missing, is replaced by a more general flow that compares current version params against their normalized forms and redirects whenever they differ.

Changes:

  • Replace _reader_redirect_add_languages with three new functions: _reader_redirect_versions, _get_normalized_versions, and _get_current_and_normalized_versions
  • Add support for multi-panel version normalization by processing p1/p2 panel refs and their corresponding ven1/vhe1 version params
  • Redirect to a URL with non-existent version params removed (rather than only normalizing format)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return redirect(f'/{tref}/?{query_params.urlencode()}')


def _get_normalized_versions(tref, ven, vhe):
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_normalized_versions calls Ref(tref).version_list() (a DB query) on line 393 before checking whether ven or vhe are None. Since _get_current_and_normalized_versions is called on every catchall request (including those with no version params), this introduces an unnecessary DB call on every page load. Consider adding an early return at the top of _get_normalized_versions when both ven and vhe are falsy, e.g. if not ven and not vhe: return [None, None].

Suggested change
def _get_normalized_versions(tref, ven, vhe):
def _get_normalized_versions(tref, ven, vhe):
if not ven and not vhe:
return [None, None]

Copilot uses AI. Check for mistakes.


def _get_normalized_versions(tref, ven, vhe):
versions = Ref(tref).version_list()
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_normalized_versions calls Ref(tref) without any exception handling. For the main panel's tref, this was also the case in the old code, but now the function is also called with user-controlled tref values from query params (p1, p2, etc.) which could be arbitrary strings. If a p1 query param contains an invalid ref, Ref(tref) will raise InputError, resulting in an unhandled 500 error. Consider wrapping the Ref(tref).version_list() call in a try/except for InputError (returning [None, None] on failure), or validating the tref before processing.

Suggested change
versions = Ref(tref).version_list()
try:
versions = Ref(tref).version_list()
except InputError:
# Invalid tref; cannot resolve versions. Return placeholders for both directions.
return [None, None]

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YishaiGlasner ?
(If we had tests, then we could know what the expected behavior is 😇 )

Comment on lines +392 to +425
def _get_normalized_versions(tref, ven, vhe):
versions = Ref(tref).version_list()
normalized = []
for version_param, direction in [(ven, 'ltr'), (vhe, 'rtl')]:
if not version_param:
normalized.append(None)
continue
if '|' in version_param:
lang, vtitle = version_param.split('|', 1)
else:
lang, vtitle = None, version_param # Legacy url with only version title
vtitle = vtitle.replace('_', ' ')
candidates = [v for v in versions if v['direction'] == direction]
version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None)
or next((v for v in candidates if v['languageFamilyName'] == lang), None)
or next((v for v in candidates if v['versionTitle'] == vtitle), None))
if version:
normalized.append(f'{version["languageFamilyName"]}|{version["versionTitle"].replace(" ", "_")}')
else:
normalized.append(None)
return normalized

def _get_current_and_normalized_versions(request, tref):
current_versions, normalized_versions = {}, {}
tref_mappings = {k[1:]: v for k, v in request.GET.items() if re.match(r'^p\d+$', k)}
tref_mappings[''] = tref
tref_mappings = dict(sorted(tref_mappings.items()))
for panel_num, tref in tref_mappings.items():
ven = request.GET.get(f'ven{panel_num}')
vhe = request.GET.get(f'vhe{panel_num}')
norm_ven, norm_vhe = _get_normalized_versions(tref, ven, vhe)
current_versions.update({k: v for k, v in [(f'ven{panel_num}', ven), (f'vhe{panel_num}', vhe)] if v})
normalized_versions.update({k: v for k, v in [(f'ven{panel_num}', norm_ven), (f'vhe{panel_num}', norm_vhe)] if v})
return current_versions, normalized_versions
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new functions _get_normalized_versions, _get_current_and_normalized_versions, and _reader_redirect_versions have no test coverage. Given that reader/tests/test_catchall_redirects.py already exists with tests for the catchall redirect behavior, tests should be added for the new version normalization flow. Key scenarios to cover include: legacy URLs without |, already-normalized URLs, non-existent version titles, multi-panel URLs with p1/p2 params, and invalid panel trefs.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only have tests regarding modules, not general tests for the function. also the tests are on modularization.testing! so i'm not sure what they are doing.
i think that testing this should be another task but i'm open to other opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel much more confident if you could add some tests here. I'm not against you doing it in a separate sub-task, if this is a live bug-fix that needs to be deployed asap.
But I feel strongly in favor of tests :-)

@mergify
Copy link

mergify bot commented Mar 5, 2026

🧪 CI Insights

Here's what we observed from your CI run for e6b194f.

❌ Job Failures

Pipeline Job Health on master Retries 🔍 CI Insights 📄 Logs
Continuous Continuous Testing: PyTest Unknown 0 View View

Copy link
Collaborator

@yodem yodem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Rewrite version normalization in catchall

Scope: Single file (reader/views.py), +47/-15 lines. Backend-only change.

CI Status: PyTest is failing. This needs investigation before merging.


issue: Fallback logic may silently serve wrong version (lines 407-408)

The cascading match in _get_normalized_versions:

version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None)
           or next((v for v in candidates if v['languageFamilyName'] == lang), None)  # ← this
           or next((v for v in candidates if v['versionTitle'] == vtitle), None))

The second fallback matches any version with the same language family, ignoring the requested version title entirely. If a user requests English|Nonexistent_Title, they'd silently get an arbitrary English version. Is this the intended behavior? This could mask broken URLs rather than surfacing 404s, and the user would see different content than expected without any indication.

suggestion: Variable shadowing of tref (line 421)

def _get_current_and_normalized_versions(request, tref):
    ...
    for panel_num, tref in tref_mappings.items():  # shadows parameter

The loop variable tref shadows the function parameter. While functionally fine (the parameter value is preserved in tref_mappings['']), it's a readability trap. Consider renaming to panel_tref.

question: Handling of p params that aren't valid refs

Line 418 extracts p1, p2, etc. from query params and passes their values to Ref(tref).version_list() inside _get_normalized_versions. If a p1 value is not a valid ref, this will raise an exception. Is there any upstream validation ensuring these are always valid, or should there be a try/except here?

nitpick: Missing blank line (line 415-416)

PEP 8 recommends two blank lines between top-level function definitions. There's only one between _get_normalized_versions and _get_current_and_normalized_versions.

praise: Good optimization to skip DB query

Line 393-394 avoids hitting the database when no version params are present, which is likely the common case.

praise: Cleaner redirect logic

The new current_versions != normalized_versions check in catchall is much cleaner than the old approach of checking for | absence. It correctly handles all cases including already-normalized URLs (no unnecessary redirect).

praise: Multi-panel support

The old code only handled ven/vhe for the primary panel. The new code correctly handles ven1/vhe1, ven2/vhe2, etc. via tref_mappings. This is a meaningful improvement.


Summary

Aspect Assessment
Correctness Generally good, but the fallback logic (serving any version of matching language) may be too permissive
CI Failing - PyTest needs investigation
Security No new concerns - follows existing patterns
Performance Improved (early return avoids unnecessary DB queries)
Readability Good structure. Minor variable shadowing issue

Recommendation: Investigate CI failure and clarify the language-only fallback behavior before merging.

@YishaiGlasner
Copy link
Contributor Author

PR Review: Rewrite version normalization in catchall

Scope: Single file (reader/views.py), +47/-15 lines. Backend-only change.

CI Status: PyTest is failing. This needs investigation before merging.

issue: Fallback logic may silently serve wrong version (lines 407-408)

The cascading match in _get_normalized_versions:

version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None)
           or next((v for v in candidates if v['languageFamilyName'] == lang), None)  # ← this
           or next((v for v in candidates if v['versionTitle'] == vtitle), None))

The second fallback matches any version with the same language family, ignoring the requested version title entirely. If a user requests English|Nonexistent_Title, they'd silently get an arbitrary English version. Is this the intended behavior? This could mask broken URLs rather than surfacing 404s, and the user would see different content than expected without any indication.

suggestion: Variable shadowing of tref (line 421)

def _get_current_and_normalized_versions(request, tref):
    ...
    for panel_num, tref in tref_mappings.items():  # shadows parameter

The loop variable tref shadows the function parameter. While functionally fine (the parameter value is preserved in tref_mappings['']), it's a readability trap. Consider renaming to panel_tref.

question: Handling of p params that aren't valid refs

Line 418 extracts p1, p2, etc. from query params and passes their values to Ref(tref).version_list() inside _get_normalized_versions. If a p1 value is not a valid ref, this will raise an exception. Is there any upstream validation ensuring these are always valid, or should there be a try/except here?

nitpick: Missing blank line (line 415-416)

PEP 8 recommends two blank lines between top-level function definitions. There's only one between _get_normalized_versions and _get_current_and_normalized_versions.

praise: Good optimization to skip DB query

Line 393-394 avoids hitting the database when no version params are present, which is likely the common case.

praise: Cleaner redirect logic

The new current_versions != normalized_versions check in catchall is much cleaner than the old approach of checking for | absence. It correctly handles all cases including already-normalized URLs (no unnecessary redirect).

praise: Multi-panel support

The old code only handled ven/vhe for the primary panel. The new code correctly handles ven1/vhe1, ven2/vhe2, etc. via tref_mappings. This is a meaningful improvement.

Summary

Aspect Assessment
Correctness Generally good, but the fallback logic (serving any version of matching language) may be too permissive
CI Failing - PyTest needs investigation
Security No new concerns - follows existing patterns
Performance Improved (early return avoids unnecessary DB queries)
Readability Good structure. Minor variable shadowing issue
Recommendation: Investigate CI failure and clarify the language-only fallback behavior before merging.

@yodem added newline.
yes, the issue here is default rather than 404.
when the second panel has an invalid ref, the durrent behaviour is removing the panel, and it insists (as per my checking).
tests will probably pass now (it's arbitrary and Noah handles it).

Comment on lines +392 to +425
def _get_normalized_versions(tref, ven, vhe):
versions = Ref(tref).version_list()
normalized = []
for version_param, direction in [(ven, 'ltr'), (vhe, 'rtl')]:
if not version_param:
normalized.append(None)
continue
if '|' in version_param:
lang, vtitle = version_param.split('|', 1)
else:
lang, vtitle = None, version_param # Legacy url with only version title
vtitle = vtitle.replace('_', ' ')
candidates = [v for v in versions if v['direction'] == direction]
version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None)
or next((v for v in candidates if v['languageFamilyName'] == lang), None)
or next((v for v in candidates if v['versionTitle'] == vtitle), None))
if version:
normalized.append(f'{version["languageFamilyName"]}|{version["versionTitle"].replace(" ", "_")}')
else:
normalized.append(None)
return normalized

def _get_current_and_normalized_versions(request, tref):
current_versions, normalized_versions = {}, {}
tref_mappings = {k[1:]: v for k, v in request.GET.items() if re.match(r'^p\d+$', k)}
tref_mappings[''] = tref
tref_mappings = dict(sorted(tref_mappings.items()))
for panel_num, tref in tref_mappings.items():
ven = request.GET.get(f'ven{panel_num}')
vhe = request.GET.get(f'vhe{panel_num}')
norm_ven, norm_vhe = _get_normalized_versions(tref, ven, vhe)
current_versions.update({k: v for k, v in [(f'ven{panel_num}', ven), (f'vhe{panel_num}', vhe)] if v})
normalized_versions.update({k: v for k, v in [(f'ven{panel_num}', norm_ven), (f'vhe{panel_num}', norm_vhe)] if v})
return current_versions, normalized_versions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would feel much more confident if you could add some tests here. I'm not against you doing it in a separate sub-task, if this is a live bug-fix that needs to be deployed asap.
But I feel strongly in favor of tests :-)



def _get_normalized_versions(tref, ven, vhe):
versions = Ref(tref).version_list()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YishaiGlasner ?
(If we had tests, then we could know what the expected behavior is 😇 )


def _reader_redirect_add_languages(request, tref):
versions = Ref(tref).version_list()
def _reader_redirect_versions(request, tref, current_versions, normalized_versions):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring for this function

normalized.append(None)
return normalized

def _get_current_and_normalized_versions(request, tref):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring, so one can understand in one line what this function is doing and returning 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants