feat(linker): optimize text modification by processing only changed segments#3104
feat(linker): optimize text modification by processing only changed segments#3104nsantacruz wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors text modification tracking in sefaria/tracker.py to optimize performance by processing only changed segments rather than the entire text on each modification. The implementation introduces a new recursive helper function _post_modify_changed_segments that compares old and new text structures and selectively calls post_modify_text for segments that have changed.
Changes:
- Added
_post_modify_changed_segmentsfunction to recursively compare and process only changed segments - Modified
modify_textto use the new segment-level tracking instead of processing the entire text - Adjusted
count_afterparameter handling to prevent redundant counting during segment iteration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| orig_count_after = kwargs.get("count_after", 1) | ||
| kwargs['count_after'] = False | ||
|
|
||
| _post_modify_changed_segments(user, action, oref, lang, vtitle, old_text, text, version_id, **kwargs) |
There was a problem hiding this comment.
the "kwargs all the way down" style here is a bit confusing, especially now that the counting logic has been modified. Maybe worthwhile, while were at it, taking out the the counting-related vars out of kwargs land so it's a bit clearer what's going on.. (concretely, trying to make these vars explicit as much as possible)
This pull request refactors how text modifications are tracked and processed in
sefaria/tracker.py. The main change is to only process and log segments of text that have actually changed, rather than re-processing the entire text every time. This should improve efficiency and accuracy in change tracking and downstream processing.Key changes:
Segment-level change tracking:
Added a new helper function
_post_modify_changed_segmentsthat recursively compares the old and new text, and only callspost_modify_textfor segments that have changed. This ensures that only modified segments are processed and logged, rather than the entire text.Updated
modify_textto use_post_modify_changed_segmentsinstead of callingpost_modify_textdirectly, and adjusted how thecount_afterparameter is handled to avoid redundant counting and indexing.