Skip to content

feat(linker): optimize text modification by processing only changed segments#3104

Open
nsantacruz wants to merge 1 commit intomasterfrom
post-modify-text-only-on-changed
Open

feat(linker): optimize text modification by processing only changed segments#3104
nsantacruz wants to merge 1 commit intomasterfrom
post-modify-text-only-on-changed

Conversation

@nsantacruz
Copy link
Contributor

@nsantacruz nsantacruz commented Feb 18, 2026

This pull request refactors how text modifications are tracked and processed in sefaria/tracker.py. The main change is to only process and log segments of text that have actually changed, rather than re-processing the entire text every time. This should improve efficiency and accuracy in change tracking and downstream processing.

Key changes:

Segment-level change tracking:

  • Added a new helper function _post_modify_changed_segments that recursively compares the old and new text, and only calls post_modify_text for segments that have changed. This ensures that only modified segments are processed and logged, rather than the entire text.

  • Updated modify_text to use _post_modify_changed_segments instead of calling post_modify_text directly, and adjusted how the count_after parameter is handled to avoid redundant counting and indexing.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors text modification tracking in sefaria/tracker.py to optimize performance by processing only changed segments rather than the entire text on each modification. The implementation introduces a new recursive helper function _post_modify_changed_segments that compares old and new text structures and selectively calls post_modify_text for segments that have changed.

Changes:

  • Added _post_modify_changed_segments function to recursively compare and process only changed segments
  • Modified modify_text to use the new segment-level tracking instead of processing the entire text
  • Adjusted count_after parameter handling to prevent redundant counting during segment iteration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mergify
Copy link

mergify bot commented Feb 18, 2026

🧪 CI Insights

Here's what we observed from your CI run for 55e96b5.

❌ Job Failures

Pipeline Job Health on master Retries 🔍 CI Insights 📄 Logs
Continuous Continuous Testing: PyTest Healthy 0 View View

@Sefaria Sefaria deleted a comment from Copilot AI Feb 25, 2026
@Sefaria Sefaria deleted a comment from Copilot AI Feb 25, 2026
@Sefaria Sefaria deleted a comment from Copilot AI Feb 25, 2026
@Sefaria Sefaria deleted a comment from Copilot AI Feb 25, 2026
@Sefaria Sefaria deleted a comment from Copilot AI Feb 25, 2026
orig_count_after = kwargs.get("count_after", 1)
kwargs['count_after'] = False

_post_modify_changed_segments(user, action, oref, lang, vtitle, old_text, text, version_id, **kwargs)
Copy link
Contributor

@yonadavGit yonadavGit Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "kwargs all the way down" style here is a bit confusing, especially now that the counting logic has been modified. Maybe worthwhile, while were at it, taking out the the counting-related vars out of kwargs land so it's a bit clearer what's going on.. (concretely, trying to make these vars explicit as much as possible)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants