Summary
When a Human checkpoint computes its diff using build_token_aligned_diffs, the LCS algorithm can find a matching token in the middle of a line, placing the human attribution boundary 42+ chars into a line rather than at the line start. This mid-line split causes:
- Line N (split line): has both AI chars (prefix) + human chars (suffix) →
overrode is set → kept in line_attributions (visible in git note as "human overrode AI")
- Lines N+1..M (subsequent new lines): covered only by human attribution →
overrode = None → stripped by attributions_to_line_attributions_for_checkpoint
The git note ends up with a gap at lines N+1..M — those AI-written lines are attributed to human. Line N itself appears as human (with overrode) but lines after it are silently dropped.
Reproduced from session 3c663e49-9c9f-4225-aea3-8efb93ab4471, commit 89cdae17, file tests/integration/rebase_realworld.rs lines 603–614.
Root Cause
In src/authorship/attribution_tracker.rs, attributions_to_line_attributions_for_checkpoint strips pure-human lines:
// src/authorship/attribution_tracker.rs ~line 2073
merged_line_authors.retain(|line_attr| {
line_attr.author_id != CheckpointKind::Human.to_str() || line_attr.overrode.is_some()
});
overrode is only set when a line has both AI and human char-level attributions overlapping it (find_dominant_author_for_line_candidates). If the Human checkpoint's attribution boundary starts mid-line, the split line gets overrode but subsequent lines do not — they're pure human and get stripped.
Why the boundary ends up mid-line
The Human checkpoint uses build_token_aligned_diffs (not force_split). The LCS token matching finds a common token between old content and new content. In the original session:
- Old content (line 603 before Subagent A's edit):
// sha4 = C5': all 5 files
- New content (line 603 after Subagent A's edit):
assert_blame_sample_at_commit(&repo, &chain[3], "users.py", ...
Both lines contain the token chain[3]. The LCS match anchors at chain[3], so the human attribution starts at char 23933 — which is 42 chars into line 603, not at the line boundary (char 23891).
Line 603 chars [23891, 23983):
[23891, 23933) = " assert_blame_sample_at_commit(&repo, &" ← AI attributed
[23933, 23983) = "chain[3], \"users.py\", ..." ← Human attributed
^--- split here (LCS token match at `chain[3]`)
Line 603: AI + Human overlap → overrode set → KEPT in note
Lines 604-614: only Human attribution → stripped → ABSENT from note
Exact Data from the Real Bug
From the final char attributions in checkpoint #71 (blob ed766fcc60e13c3c):
AI: {start: 23042, end: 23933, author: '36ee87f956a9e26f'}
Human: {start: 23933, end: 24521, author: 'human', ts: 1775504264429}
Line 603 = chars [23891, 23983)
→ straddles the AI/Human boundary at 23933
→ overrode is set for line 603
Lines 604-614 = chars [23983, 24521)
→ entirely within Human attribution [23933, 24521)
→ overrode = None → stripped by retain()
Final git note: 1-602, 615-8664 as AI → gap at 603-614 = human.
Reproduction
This bug is secondary to #994 (daemon race) — you need the Human checkpoint to form first. Once it does, construct a file where a prior AI-attributed line shares a token suffix with the new AI content being added:
# Old line 603 (in last AI checkpoint): " // sha4 = C5': all 5 files changed"
# New line 603 (written by parallel subagent): " assert_blame_sample_at_commit(&repo, &chain[3], ...)"
# Common LCS token: chain[3]
# → Human attribution starts mid-line 603 at chain[3]
# → Lines 604+ are pure human → stripped
Affected Code
src/authorship/attribution_tracker.rs — attributions_to_line_attributions_for_checkpoint retain filter (lines ~2073–2076)
src/authorship/attribution_tracker.rs — find_dominant_author_for_line_candidates — only sets overrode when both AI and human overlap the same line
src/authorship/attribution_tracker.rs — build_token_aligned_diffs — LCS can produce non-line-aligned boundaries
Potential Fix Directions
- Clamp attribution boundaries to line starts in
build_token_aligned_diffs when the diff is for a Human checkpoint — ensuring human attribution never starts mid-line.
- In
attributions_to_line_attributions_for_checkpoint, when a human line immediately follows a line with overrode, carry forward the overrode context so subsequent lines aren't silently stripped.
- In AI checkpoints, scan for human char attributions that have no
overrode in adjacent lines and re-evaluate whether those chars should be reclaimed.
Relationship to #994
These two bugs compound each other:
Without #994, this bug cannot manifest (no spurious Human checkpoint). Fixing #994 alone would fix the observed 12-line misattribution. But this bug would still affect any intentional human edits that partially overlap with AI-attributed content.
Summary
When a Human checkpoint computes its diff using
build_token_aligned_diffs, the LCS algorithm can find a matching token in the middle of a line, placing the human attribution boundary 42+ chars into a line rather than at the line start. This mid-line split causes:overrodeis set → kept inline_attributions(visible in git note as "human overrode AI")overrode = None→ stripped byattributions_to_line_attributions_for_checkpointThe git note ends up with a gap at lines N+1..M — those AI-written lines are attributed to human. Line N itself appears as human (with
overrode) but lines after it are silently dropped.Reproduced from session
3c663e49-9c9f-4225-aea3-8efb93ab4471, commit89cdae17, filetests/integration/rebase_realworld.rslines 603–614.Root Cause
In
src/authorship/attribution_tracker.rs,attributions_to_line_attributions_for_checkpointstrips pure-human lines:overrodeis only set when a line has both AI and human char-level attributions overlapping it (find_dominant_author_for_line_candidates). If the Human checkpoint's attribution boundary starts mid-line, the split line getsoverrodebut subsequent lines do not — they're pure human and get stripped.Why the boundary ends up mid-line
The Human checkpoint uses
build_token_aligned_diffs(notforce_split). The LCS token matching finds a common token between old content and new content. In the original session:// sha4 = C5': all 5 filesassert_blame_sample_at_commit(&repo, &chain[3], "users.py", ...Both lines contain the token
chain[3]. The LCS match anchors atchain[3], so the human attribution starts at char 23933 — which is 42 chars into line 603, not at the line boundary (char 23891).Exact Data from the Real Bug
From the final char attributions in checkpoint #71 (blob
ed766fcc60e13c3c):Final git note:
1-602, 615-8664as AI → gap at 603-614 = human.Reproduction
This bug is secondary to #994 (daemon race) — you need the Human checkpoint to form first. Once it does, construct a file where a prior AI-attributed line shares a token suffix with the new AI content being added:
Affected Code
src/authorship/attribution_tracker.rs—attributions_to_line_attributions_for_checkpointretain filter (lines ~2073–2076)src/authorship/attribution_tracker.rs—find_dominant_author_for_line_candidates— only setsoverrodewhen both AI and human overlap the same linesrc/authorship/attribution_tracker.rs—build_token_aligned_diffs— LCS can produce non-line-aligned boundariesPotential Fix Directions
build_token_aligned_diffswhen the diff is for a Human checkpoint — ensuring human attribution never starts mid-line.attributions_to_line_attributions_for_checkpoint, when a human line immediately follows a line withoverrode, carry forward theoverrodecontext so subsequent lines aren't silently stripped.overrodein adjacent lines and re-evaluate whether those chars should be reclaimed.Relationship to #994
These two bugs compound each other:
overrodeand appears in the noteWithout #994, this bug cannot manifest (no spurious Human checkpoint). Fixing #994 alone would fix the observed 12-line misattribution. But this bug would still affect any intentional human edits that partially overlap with AI-attributed content.