Bug/sc 40665/ voices search results display user id instead by yodem · Pull Request #3063 · Sefaria/Sefaria-Project

yodem · 2026-01-29T10:13:04Z

Description

Add Django database health check to Elasticsearch reindexing cronjob and clean up excessive debug logging.

Code Changes

scripts/scheduled/reindex_elasticsearch_cronjob.py:

Added check_django_database_connection() preflight check to prevent sheet indexing with broken user lookups (would produce "User {id}" instead of real names)
Removed truncation limits on failure/skip reporting (was capping at 50/20 items)
Removed excessive debug logging throughout the script

sefaria/search.py:

Removed excessive debug logging (countdown messages, index state checks, progress logs)
Cleaned up redundant log statements in clear_index(), index_all_of_type(), and index_all_of_type_by_index_name()

Notes

The Django database check is critical for Kubernetes deployments - if local-settings-secret is misconfigured, sheet indexing silently produces garbage data. The cronjob now fails fast with a clear error message instead.

When the Django database is unreachable, UserProfile silently falls back to "User {id}" names, which corrupts Elasticsearch indexes during reindexing. Changes: - Add logging to UserProfile.__init__ exception handler - Add _django_user_found flag and has_django_user property - Update public_user_data() to return None when Django User not found - Fix: is_staff is a property, not a method (user.is_staff not user.is_staff()) This enables detection of broken database connections and prevents indexing sheets with fallback names like "User 93082". Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add a pre-flight check that verifies Django database connectivity before starting Elasticsearch reindexing. This prevents indexing sheets with "User {id}" names when the database is unreachable. The check: - Verifies DATABASES_USER, DATABASES_PASSWORD, DATABASES_HOST, DATABASES_PORT environment variables are set - Tests actual database connectivity with a query - Provides clear error messages pointing to local-settings-secret K8s secret - Aborts the cronjob early if the check fails This addresses the root cause of search results showing "User 93082" instead of actual author names when the indexing cronjob runs with broken DB credentials. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Log owner_id, owner_name, and title when indexing sheets in debug mode. This helps verify the correct data is being indexed to Elasticsearch. Run with --debug flag to see the indexed data: python scripts/scheduled/reindex_elasticsearch_cronjob.py --debug Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

… reindexing scripts - Removed excessive debug logging statements to clean up the code. - Enhanced error handling in various functions to ensure smoother execution. - Adjusted the way failed and skipped text versions are reported for better clarity.

…job configuration

- Eliminated logging for failed Django User lookups to streamline the code. - Removed the _django_user_found flag and has_django_user property, simplifying user profile handling. - Updated public_user_data function to focus on returning user data without signaling Django User presence. - Adjusted error handling to avoid unnecessary complexity during user profile initialization.

…nto bug/sc-40665/-voices-search-results-display-user-id-instead

mergify · 2026-01-29T10:50:36Z

🧪 CI Insights

Here's what we observed from your CI run for 7ebdfd8.

❌ Job Failures

Pipeline	Job	Health on `master`	Retries	🔍 CI Insights	📄 Logs
`Continuous`	`Continuous Testing: PyTest`		`0`	View	View

…nto bug/sc-40665/-voices-search-results-display-user-id-instead

stevekaplan123 · 2026-02-01T09:13:31Z

scripts/scheduled/reindex_elasticsearch_cronjob.py

+                        f"name: {first_user.first_name} {first_user.last_name}")
+
+        return True
+    except Exception as e:


I wonder if instead of catching a general Exception, we should try...catch User.objects.count separately from User.objects.first.

stevekaplan123 · 2026-02-01T09:15:02Z

scripts/scheduled/reindex_elasticsearch_cronjob.py

            lines.append(f"\nFailed Text Versions: {len(self.failed_text_versions)}")
            lines.append("-" * 40)
-            for i, failure in enumerate(self.failed_text_versions[:50], 1):
+            for i, failure in enumerate(self.failed_text_versions, 1):


I don't understand this change or what the original code intended. Why were we stopping at 50 failed_text_versions before and why are we no longer doing this? Same question about only looking at 20 skipped_text_versions below.

stevekaplan123 · 2026-02-01T09:16:15Z

sefaria/search.py

    except NotFoundError:
        # Index doesn't exist - handle race condition where index is deleted between exists() check and delete() call
-        logger.debug(f"Index not found when attempting to delete - index_name: {index_name}")
+        pass


Curious why we are removing so many logger statements. Couldn't they be useful? Especially, here... it seems bad to have a "pass" in an exception

dcschreiber and others added 11 commits January 19, 2026 14:19

chore: trigger CI

26e9825

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix(ci): trigger build

6953e13

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix(reindex): add --debug flag to elasticsearch reindex cronjob

a653379

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix(reindex): update Elasticsearch host in cronjob configuration

9252dea

fix(reindex): update Elasticsearch host and remove debug flag in cron…

111d122

…job configuration

Merge branch 'master' of https://github.com/Sefaria/Sefaria-Project i…

38a802c

…nto bug/sc-40665/-voices-search-results-display-user-id-instead

yodem requested a review from stevekaplan123 January 29, 2026 10:13

yodem requested a review from YishaiGlasner February 1, 2026 08:40

Merge branch 'master' of https://github.com/Sefaria/Sefaria-Project i…

7ebdfd8

…nto bug/sc-40665/-voices-search-results-display-user-id-instead

stevekaplan123 reviewed Feb 1, 2026

View reviewed changes

YishaiGlasner approved these changes Feb 1, 2026

View reviewed changes

stevekaplan123 reviewed Feb 1, 2026

View reviewed changes

yodem enabled auto-merge February 1, 2026 09:16

stevekaplan123 reviewed Feb 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug/sc 40665/ voices search results display user id instead#3063

Bug/sc 40665/ voices search results display user id instead#3063
yodem wants to merge 12 commits intomasterfrom
bug/sc-40665/-voices-search-results-display-user-id-instead

yodem commented Jan 29, 2026 •

edited

Loading

Uh oh!

mergify bot commented Jan 29, 2026 •

edited

Loading

Uh oh!

stevekaplan123 Feb 1, 2026

Uh oh!

stevekaplan123 Feb 1, 2026

Uh oh!

stevekaplan123 Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

yodem commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Code Changes

Notes

Uh oh!

mergify bot commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

❌ Job Failures

Uh oh!

stevekaplan123 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

stevekaplan123 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

stevekaplan123 Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yodem commented Jan 29, 2026 •

edited

Loading

mergify bot commented Jan 29, 2026 •

edited

Loading