refactor(catchall): rewrite version normalization in catchall to hand…#3133
refactor(catchall): rewrite version normalization in catchall to hand…#3133YishaiGlasner wants to merge 3 commits intomasterfrom
Conversation
…le non-existent versions and support multi-panel URLs Replace _reader_redirect_add_languages with a more robust version normalization flow that supports per-panel version params (ven, ven1, etc.). The redirect now fires when current params differ from normalized ones, rather than whenever a '|' separator is missing. Also, when the required version doesn't exist it redirects to default (conserving the redirect for legacy urls without version language).
There was a problem hiding this comment.
Pull request overview
This PR refactors the version normalization logic in the catchall view to handle non-existent versions (by redirecting to a version-less URL) and to support multi-panel URLs with per-panel version parameters (ven1, vhe1, etc.). The old _reader_redirect_add_languages function, which only triggered redirects when a | separator was missing, is replaced by a more general flow that compares current version params against their normalized forms and redirects whenever they differ.
Changes:
- Replace
_reader_redirect_add_languageswith three new functions:_reader_redirect_versions,_get_normalized_versions, and_get_current_and_normalized_versions - Add support for multi-panel version normalization by processing
p1/p2panel refs and their correspondingven1/vhe1version params - Redirect to a URL with non-existent version params removed (rather than only normalizing format)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return redirect(f'/{tref}/?{query_params.urlencode()}') | ||
|
|
||
|
|
||
| def _get_normalized_versions(tref, ven, vhe): |
There was a problem hiding this comment.
_get_normalized_versions calls Ref(tref).version_list() (a DB query) on line 393 before checking whether ven or vhe are None. Since _get_current_and_normalized_versions is called on every catchall request (including those with no version params), this introduces an unnecessary DB call on every page load. Consider adding an early return at the top of _get_normalized_versions when both ven and vhe are falsy, e.g. if not ven and not vhe: return [None, None].
| def _get_normalized_versions(tref, ven, vhe): | |
| def _get_normalized_versions(tref, ven, vhe): | |
| if not ven and not vhe: | |
| return [None, None] |
|
|
||
|
|
||
| def _get_normalized_versions(tref, ven, vhe): | ||
| versions = Ref(tref).version_list() |
There was a problem hiding this comment.
_get_normalized_versions calls Ref(tref) without any exception handling. For the main panel's tref, this was also the case in the old code, but now the function is also called with user-controlled tref values from query params (p1, p2, etc.) which could be arbitrary strings. If a p1 query param contains an invalid ref, Ref(tref) will raise InputError, resulting in an unhandled 500 error. Consider wrapping the Ref(tref).version_list() call in a try/except for InputError (returning [None, None] on failure), or validating the tref before processing.
| versions = Ref(tref).version_list() | |
| try: | |
| versions = Ref(tref).version_list() | |
| except InputError: | |
| # Invalid tref; cannot resolve versions. Return placeholders for both directions. | |
| return [None, None] |
There was a problem hiding this comment.
@YishaiGlasner ?
(If we had tests, then we could know what the expected behavior is 😇 )
| def _get_normalized_versions(tref, ven, vhe): | ||
| versions = Ref(tref).version_list() | ||
| normalized = [] | ||
| for version_param, direction in [(ven, 'ltr'), (vhe, 'rtl')]: | ||
| if not version_param: | ||
| normalized.append(None) | ||
| continue | ||
| if '|' in version_param: | ||
| lang, vtitle = version_param.split('|', 1) | ||
| else: | ||
| lang, vtitle = None, version_param # Legacy url with only version title | ||
| vtitle = vtitle.replace('_', ' ') | ||
| candidates = [v for v in versions if v['direction'] == direction] | ||
| version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None) | ||
| or next((v for v in candidates if v['languageFamilyName'] == lang), None) | ||
| or next((v for v in candidates if v['versionTitle'] == vtitle), None)) | ||
| if version: | ||
| normalized.append(f'{version["languageFamilyName"]}|{version["versionTitle"].replace(" ", "_")}') | ||
| else: | ||
| normalized.append(None) | ||
| return normalized | ||
|
|
||
| def _get_current_and_normalized_versions(request, tref): | ||
| current_versions, normalized_versions = {}, {} | ||
| tref_mappings = {k[1:]: v for k, v in request.GET.items() if re.match(r'^p\d+$', k)} | ||
| tref_mappings[''] = tref | ||
| tref_mappings = dict(sorted(tref_mappings.items())) | ||
| for panel_num, tref in tref_mappings.items(): | ||
| ven = request.GET.get(f'ven{panel_num}') | ||
| vhe = request.GET.get(f'vhe{panel_num}') | ||
| norm_ven, norm_vhe = _get_normalized_versions(tref, ven, vhe) | ||
| current_versions.update({k: v for k, v in [(f'ven{panel_num}', ven), (f'vhe{panel_num}', vhe)] if v}) | ||
| normalized_versions.update({k: v for k, v in [(f'ven{panel_num}', norm_ven), (f'vhe{panel_num}', norm_vhe)] if v}) | ||
| return current_versions, normalized_versions |
There was a problem hiding this comment.
The new functions _get_normalized_versions, _get_current_and_normalized_versions, and _reader_redirect_versions have no test coverage. Given that reader/tests/test_catchall_redirects.py already exists with tests for the catchall redirect behavior, tests should be added for the new version normalization flow. Key scenarios to cover include: legacy URLs without |, already-normalized URLs, non-existent version titles, multi-panel URLs with p1/p2 params, and invalid panel trefs.
There was a problem hiding this comment.
we only have tests regarding modules, not general tests for the function. also the tests are on modularization.testing! so i'm not sure what they are doing.
i think that testing this should be another task but i'm open to other opinions.
There was a problem hiding this comment.
I would feel much more confident if you could add some tests here. I'm not against you doing it in a separate sub-task, if this is a live bug-fix that needs to be deployed asap.
But I feel strongly in favor of tests :-)
yodem
left a comment
There was a problem hiding this comment.
PR Review: Rewrite version normalization in catchall
Scope: Single file (reader/views.py), +47/-15 lines. Backend-only change.
CI Status: PyTest is failing. This needs investigation before merging.
issue: Fallback logic may silently serve wrong version (lines 407-408)
The cascading match in _get_normalized_versions:
version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None)
or next((v for v in candidates if v['languageFamilyName'] == lang), None) # ← this
or next((v for v in candidates if v['versionTitle'] == vtitle), None))The second fallback matches any version with the same language family, ignoring the requested version title entirely. If a user requests English|Nonexistent_Title, they'd silently get an arbitrary English version. Is this the intended behavior? This could mask broken URLs rather than surfacing 404s, and the user would see different content than expected without any indication.
suggestion: Variable shadowing of tref (line 421)
def _get_current_and_normalized_versions(request, tref):
...
for panel_num, tref in tref_mappings.items(): # shadows parameterThe loop variable tref shadows the function parameter. While functionally fine (the parameter value is preserved in tref_mappings['']), it's a readability trap. Consider renaming to panel_tref.
question: Handling of p params that aren't valid refs
Line 418 extracts p1, p2, etc. from query params and passes their values to Ref(tref).version_list() inside _get_normalized_versions. If a p1 value is not a valid ref, this will raise an exception. Is there any upstream validation ensuring these are always valid, or should there be a try/except here?
nitpick: Missing blank line (line 415-416)
PEP 8 recommends two blank lines between top-level function definitions. There's only one between _get_normalized_versions and _get_current_and_normalized_versions.
praise: Good optimization to skip DB query
Line 393-394 avoids hitting the database when no version params are present, which is likely the common case.
praise: Cleaner redirect logic
The new current_versions != normalized_versions check in catchall is much cleaner than the old approach of checking for | absence. It correctly handles all cases including already-normalized URLs (no unnecessary redirect).
praise: Multi-panel support
The old code only handled ven/vhe for the primary panel. The new code correctly handles ven1/vhe1, ven2/vhe2, etc. via tref_mappings. This is a meaningful improvement.
Summary
| Aspect | Assessment |
|---|---|
| Correctness | Generally good, but the fallback logic (serving any version of matching language) may be too permissive |
| CI | Failing - PyTest needs investigation |
| Security | No new concerns - follows existing patterns |
| Performance | Improved (early return avoids unnecessary DB queries) |
| Readability | Good structure. Minor variable shadowing issue |
Recommendation: Investigate CI failure and clarify the language-only fallback behavior before merging.
@yodem added newline. |
| def _get_normalized_versions(tref, ven, vhe): | ||
| versions = Ref(tref).version_list() | ||
| normalized = [] | ||
| for version_param, direction in [(ven, 'ltr'), (vhe, 'rtl')]: | ||
| if not version_param: | ||
| normalized.append(None) | ||
| continue | ||
| if '|' in version_param: | ||
| lang, vtitle = version_param.split('|', 1) | ||
| else: | ||
| lang, vtitle = None, version_param # Legacy url with only version title | ||
| vtitle = vtitle.replace('_', ' ') | ||
| candidates = [v for v in versions if v['direction'] == direction] | ||
| version = (next((v for v in candidates if v['versionTitle'] == vtitle and v['languageFamilyName'] == lang), None) | ||
| or next((v for v in candidates if v['languageFamilyName'] == lang), None) | ||
| or next((v for v in candidates if v['versionTitle'] == vtitle), None)) | ||
| if version: | ||
| normalized.append(f'{version["languageFamilyName"]}|{version["versionTitle"].replace(" ", "_")}') | ||
| else: | ||
| normalized.append(None) | ||
| return normalized | ||
|
|
||
| def _get_current_and_normalized_versions(request, tref): | ||
| current_versions, normalized_versions = {}, {} | ||
| tref_mappings = {k[1:]: v for k, v in request.GET.items() if re.match(r'^p\d+$', k)} | ||
| tref_mappings[''] = tref | ||
| tref_mappings = dict(sorted(tref_mappings.items())) | ||
| for panel_num, tref in tref_mappings.items(): | ||
| ven = request.GET.get(f'ven{panel_num}') | ||
| vhe = request.GET.get(f'vhe{panel_num}') | ||
| norm_ven, norm_vhe = _get_normalized_versions(tref, ven, vhe) | ||
| current_versions.update({k: v for k, v in [(f'ven{panel_num}', ven), (f'vhe{panel_num}', vhe)] if v}) | ||
| normalized_versions.update({k: v for k, v in [(f'ven{panel_num}', norm_ven), (f'vhe{panel_num}', norm_vhe)] if v}) | ||
| return current_versions, normalized_versions |
There was a problem hiding this comment.
I would feel much more confident if you could add some tests here. I'm not against you doing it in a separate sub-task, if this is a live bug-fix that needs to be deployed asap.
But I feel strongly in favor of tests :-)
|
|
||
|
|
||
| def _get_normalized_versions(tref, ven, vhe): | ||
| versions = Ref(tref).version_list() |
There was a problem hiding this comment.
@YishaiGlasner ?
(If we had tests, then we could know what the expected behavior is 😇 )
|
|
||
| def _reader_redirect_add_languages(request, tref): | ||
| versions = Ref(tref).version_list() | ||
| def _reader_redirect_versions(request, tref, current_versions, normalized_versions): |
There was a problem hiding this comment.
Please add a docstring for this function
| normalized.append(None) | ||
| return normalized | ||
|
|
||
| def _get_current_and_normalized_versions(request, tref): |
There was a problem hiding this comment.
Please add a docstring, so one can understand in one line what this function is doing and returning 😄
Description
Rewrite version normalization in catchall to handle non-existent versions and support multi-panel URLs.
Replace
_reader_redirect_add_languageswith a more robust version normalization flow that supports per-panel version params (ven, ven1, etc.). The redirect now fires when current params differ from normalized ones, rather than whenever a '|' separator is missing. Also, when the required version doesn't exist it redirects to default (conserving the redirect for legacy urls without version language).Functions description
_get_current_and_normalized_versions- returnscurrent_versionsa dict of all version params (ven, vhe, ven1...) in the request, andnormalized_versionsall the normalized params._get_normalized_versions- returns the normalized versions for specific panel (with tref and optional ven and vhe)._reader_redirect_versions- returns a redirect with the normalized params.