Bug/sc 40665/ voices search results display user id instead#3063
Open
Bug/sc 40665/ voices search results display user id instead#3063
Conversation
When the Django database is unreachable, UserProfile silently falls back
to "User {id}" names, which corrupts Elasticsearch indexes during reindexing.
Changes:
- Add logging to UserProfile.__init__ exception handler
- Add _django_user_found flag and has_django_user property
- Update public_user_data() to return None when Django User not found
- Fix: is_staff is a property, not a method (user.is_staff not user.is_staff())
This enables detection of broken database connections and prevents indexing
sheets with fallback names like "User 93082".
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a pre-flight check that verifies Django database connectivity before
starting Elasticsearch reindexing. This prevents indexing sheets with
"User {id}" names when the database is unreachable.
The check:
- Verifies DATABASES_USER, DATABASES_PASSWORD, DATABASES_HOST, DATABASES_PORT
environment variables are set
- Tests actual database connectivity with a query
- Provides clear error messages pointing to local-settings-secret K8s secret
- Aborts the cronjob early if the check fails
This addresses the root cause of search results showing "User 93082" instead
of actual author names when the indexing cronjob runs with broken DB credentials.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Log owner_id, owner_name, and title when indexing sheets in debug mode. This helps verify the correct data is being indexed to Elasticsearch. Run with --debug flag to see the indexed data: python scripts/scheduled/reindex_elasticsearch_cronjob.py --debug Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… reindexing scripts - Removed excessive debug logging statements to clean up the code. - Enhanced error handling in various functions to ensure smoother execution. - Adjusted the way failed and skipped text versions are reported for better clarity.
…job configuration
- Eliminated logging for failed Django User lookups to streamline the code. - Removed the _django_user_found flag and has_django_user property, simplifying user profile handling. - Updated public_user_data function to focus on returning user data without signaling Django User presence. - Adjusted error handling to avoid unnecessary complexity during user profile initialization.
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
…nto bug/sc-40665/-voices-search-results-display-user-id-instead
| f"name: {first_user.first_name} {first_user.last_name}") | ||
|
|
||
| return True | ||
| except Exception as e: |
Contributor
There was a problem hiding this comment.
I wonder if instead of catching a general Exception, we should try...catch User.objects.count separately from User.objects.first.
YishaiGlasner
approved these changes
Feb 1, 2026
| lines.append(f"\nFailed Text Versions: {len(self.failed_text_versions)}") | ||
| lines.append("-" * 40) | ||
| for i, failure in enumerate(self.failed_text_versions[:50], 1): | ||
| for i, failure in enumerate(self.failed_text_versions, 1): |
Contributor
There was a problem hiding this comment.
I don't understand this change or what the original code intended. Why were we stopping at 50 failed_text_versions before and why are we no longer doing this? Same question about only looking at 20 skipped_text_versions below.
| except NotFoundError: | ||
| # Index doesn't exist - handle race condition where index is deleted between exists() check and delete() call | ||
| logger.debug(f"Index not found when attempting to delete - index_name: {index_name}") | ||
| pass |
Contributor
There was a problem hiding this comment.
Curious why we are removing so many logger statements. Couldn't they be useful? Especially, here... it seems bad to have a "pass" in an exception
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add Django database health check to Elasticsearch reindexing cronjob and clean up excessive debug logging.
Code Changes
scripts/scheduled/reindex_elasticsearch_cronjob.py:check_django_database_connection()preflight check to prevent sheet indexing with broken user lookups (would produce "User {id}" instead of real names)sefaria/search.py:clear_index(),index_all_of_type(), andindex_all_of_type_by_index_name()Notes
The Django database check is critical for Kubernetes deployments - if
local-settings-secretis misconfigured, sheet indexing silently produces garbage data. The cronjob now fails fast with a clear error message instead.