bug: token-aligned diff splits attribution mid-line, causing overrode boundary to drop subsequent AI lines from git note

## Summary

When a Human checkpoint computes its diff using `build_token_aligned_diffs`, the LCS algorithm can find a matching token *in the middle of a line*, placing the human attribution boundary 42+ chars into a line rather than at the line start. This mid-line split causes:

- **Line N** (split line): has both AI chars (prefix) + human chars (suffix) → `overrode` is set → **kept** in `line_attributions` (visible in git note as "human overrode AI")
- **Lines N+1..M** (subsequent new lines): covered only by human attribution → `overrode = None` → **stripped** by `attributions_to_line_attributions_for_checkpoint`

The git note ends up with a gap at lines N+1..M — those AI-written lines are attributed to human. Line N itself appears as human (with `overrode`) but lines after it are silently dropped.

Reproduced from session `3c663e49-9c9f-4225-aea3-8efb93ab4471`, commit `89cdae17`, file `tests/integration/rebase_realworld.rs` lines 603–614.

## Root Cause

In `src/authorship/attribution_tracker.rs`, `attributions_to_line_attributions_for_checkpoint` strips pure-human lines:

```rust
// src/authorship/attribution_tracker.rs ~line 2073
merged_line_authors.retain(|line_attr| {
    line_attr.author_id != CheckpointKind::Human.to_str() || line_attr.overrode.is_some()
});
```

`overrode` is only set when a line has *both* AI and human char-level attributions overlapping it (`find_dominant_author_for_line_candidates`). If the Human checkpoint's attribution boundary starts mid-line, the *split line* gets `overrode` but subsequent lines do not — they're pure human and get stripped.

### Why the boundary ends up mid-line

The Human checkpoint uses `build_token_aligned_diffs` (not `force_split`). The LCS token matching finds a common token between old content and new content. In the original session:

- Old content (line 603 before Subagent A's edit): `    // sha4 = C5': all 5 files`
- New content (line 603 after Subagent A's edit): `    assert_blame_sample_at_commit(&repo, &chain[3], "users.py", ...`

Both lines contain the token `chain[3]`. The LCS match anchors at `chain[3]`, so the human attribution starts at char 23933 — which is **42 chars into line 603**, not at the line boundary (char 23891).

```
Line 603 chars [23891, 23983):
  [23891, 23933) = "    assert_blame_sample_at_commit(&repo, &"  ← AI attributed
  [23933, 23983) = "chain[3], \"users.py\", ..."                ← Human attributed
                   ^--- split here (LCS token match at `chain[3]`)

Line 603: AI + Human overlap → overrode set → KEPT in note
Lines 604-614: only Human attribution → stripped → ABSENT from note
```

## Exact Data from the Real Bug

From the final char attributions in checkpoint #71 (blob `ed766fcc60e13c3c`):
```
AI:    {start: 23042, end: 23933, author: '36ee87f956a9e26f'}
Human: {start: 23933, end: 24521, author: 'human', ts: 1775504264429}

Line 603 = chars [23891, 23983)
  → straddles the AI/Human boundary at 23933
  → overrode is set for line 603

Lines 604-614 = chars [23983, 24521)
  → entirely within Human attribution [23933, 24521)
  → overrode = None → stripped by retain()
```

Final git note: `1-602, 615-8664` as AI → gap at 603-614 = human.

## Reproduction

This bug is secondary to git-ai-project/git-ai#994 (daemon race) — you need the Human checkpoint to form first. Once it does, construct a file where a prior AI-attributed line shares a token suffix with the new AI content being added:

```python
# Old line 603 (in last AI checkpoint): "    // sha4 = C5': all 5 files changed"
# New line 603 (written by parallel subagent): "    assert_blame_sample_at_commit(&repo, &chain[3], ...)"
# Common LCS token: chain[3]
# → Human attribution starts mid-line 603 at chain[3]
# → Lines 604+ are pure human → stripped
```

## Affected Code

- `src/authorship/attribution_tracker.rs` — `attributions_to_line_attributions_for_checkpoint` retain filter (lines ~2073–2076)
- `src/authorship/attribution_tracker.rs` — `find_dominant_author_for_line_candidates` — only sets `overrode` when both AI and human overlap the same line
- `src/authorship/attribution_tracker.rs` — `build_token_aligned_diffs` — LCS can produce non-line-aligned boundaries

## Potential Fix Directions

1. **Clamp attribution boundaries to line starts** in `build_token_aligned_diffs` when the diff is for a Human checkpoint — ensuring human attribution never starts mid-line.
2. **In `attributions_to_line_attributions_for_checkpoint`**, when a human line immediately follows a line with `overrode`, carry forward the `overrode` context so subsequent lines aren't silently stripped.
3. **In AI checkpoints**, scan for human char attributions that have no `overrode` in adjacent lines and re-evaluate whether those chars should be reclaimed.

## Relationship to #994

These two bugs compound each other:
- **#994** (daemon race) causes the spurious Human checkpoint to form in the first place
- **This bug** causes the specific misattribution pattern: only some lines of the AI content are silently dropped (those after the LCS split point), while the split line itself gets `overrode` and appears in the note

Without #994, this bug cannot manifest (no spurious Human checkpoint). Fixing #994 alone would fix the observed 12-line misattribution. But this bug would still affect any intentional human edits that partially overlap with AI-attributed content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: token-aligned diff splits attribution mid-line, causing overrode boundary to drop subsequent AI lines from git note #995

Summary

Root Cause

Why the boundary ends up mid-line

Exact Data from the Real Bug

Reproduction

Affected Code

Potential Fix Directions

Relationship to #994

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: token-aligned diff splits attribution mid-line, causing overrode boundary to drop subsequent AI lines from git note #995

Description

Summary

Root Cause

Why the boundary ends up mid-line

Exact Data from the Real Bug

Reproduction

Affected Code

Potential Fix Directions

Relationship to #994

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions