[feature not live] docs(flaky-tests): document AI Investigation tab by samgutentag · Pull Request #539 · trunk-io/docs

samgutentag · 2026-04-11T11:00:25Z

Summary

Adds a new page flaky-tests/ai-investigation.md documenting the Analysis tab on the Flaky Tests test detail page (trunk-io/trunk2#3504)
Covers viewing investigation results, triggering new investigations, applying fix recommendations, and viewing investigation history
Documents all seven fact types surfaced in Key Findings
Adds the page to summary.md

Source

trunk2 PR: https://github.com/trunk-io/trunk2/pull/3504

Test plan

Preview in GitBook

Adds a new page covering the Analysis tab introduced in trunk-io/trunk2#3504. Documents how to view AI investigation results, trigger new investigations, apply fix recommendations, and browse investigation history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

trunk-staging-io · 2026-04-11T11:00:29Z

Merging to main in this repository is managed by Trunk.

To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

claude · 2026-04-11T11:00:53Z

Claude finished @samgutentag's task in 1m 39s —— View job

PR Review

Gather context and read files
Review flaky-tests/ai-investigation.md
Review summary.md changes
Post inline review comments

Overall the page is well-structured, follows the repo's documentation conventions, and all relative links resolve to existing files. Two issues found:

Blocking

Line 31 — Confidence score ranges overlap: "yellow for 50%+" includes the 80%+ range already labeled green. The three bands need to be mutually exclusive (e.g., green 80%+, yellow 50–79%, orange below 50%). Inline suggestion posted.

Non-blocking

Line 55 — The "CI Logs" fact type description reads as if Trunk is augmenting test failure outputs rather than analyzing CI logs. Inline suggestion posted for a clearer phrasing, but not blocking.

claude · 2026-05-05T18:33:40Z

Claude finished @samgutentag's task in 1m 38s —— View job

PR Review

Overall the page is well-structured and follows the repo's documentation conventions. Three issues found — two blocking, one non-blocking.

Blocking

Line 31 — Confidence score ranges overlap: yellow for 50%+ includes the 80%+ threshold already labeled green. Inline suggestion posted. Fix this →
Line 91 — Broken link: use-mcp-server/mcp-tool-reference/get-root-cause-analysis.md does not exist. The mcp-tool-reference/ directory only contains fix-flaky-test.md, investigate-ci-failure.md, search-test.md, and set-up-test-uploads.md. Either point to an existing page or create the missing file before merging. Inline suggestion posted. Fix this →

Non-blocking

Line 55 — The CI Logs fact type description "Supplements test failure outputs with CI workflow logs" reads as though Trunk is augmenting outputs rather than analyzing logs. Inline suggestion posted.

summary.md — Entry placement and link look correct.

samgutentag · 2026-05-06T03:47:43Z

Triage status (2026-05-05): up to date with main. Needs review before merge.

samgutentag · 2026-05-06T22:11:01Z

Verification status (2026-05-08): pending

Eng merged but flag off in prod. Hold off.

Eng PR: trunk-io/trunk2#3504 (merged 2026-04-10, merge commit on main, intact)
Flag: enableFlakyTestInsightsTab
Signals:
- Flag still referenced in trunk2 main (ts/apps/frontend/src/constants/flags.ts, ts/apps/e2e/flags.json, ts/apps/frontend/src/components/settings/flaky-tests/flaky-tests-settings-client.tsx)
- LaunchDarkly bot updated the flag in #production-notifications multiple times through 2026-04-28 (active targeting management, not a 100% rollout)
- Tyler Jang in #team-flaky-tests on 2026-04-27 listed this flag under "AI Flaky Tests Investigation" rollout context: "Still awaiting feedback from Brex and Chainlink. Laying the ground work to run Descript, Gusto/web, Caseware in stealth." Feature is in stealth with a limited customer set.
- e2e flag default is true (test only, not a prod signal)
- 0 follow-up rollout or "delete legacy" PRs found in trunk2 search
- 0 results for any "100%" / "rolled out" / "rollout" Slack signal on this flag

Suggested next action: re-run /verify-docs-pr 539 after the flag flips to 100% in production. @tyler-jang or #team-flaky-tests can confirm the rollout schedule.

Resolves additive conflict in summary.md by keeping both Test Collections (added in main since this branch opened) and the new AI Investigation entry from this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Confidence band ranges made mutually exclusive (50-79% for yellow). Source: fact-type-config.ts:58-66 (>= 80 returns green, then >= 50 returns yellow, else orange). - "CI Logs" fact type description rewritten so it reads as analysis input rather than as supplementing test output. - Fixed broken link: get-root-cause-analysis.md was renamed to fix-flaky-test.md. - Apply Fix button is rendered unconditionally when an investigation exists (analysis-tab.tsx:275-282); removed the "if produced actionable findings" qualifier and reordered to match UI button order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two gaps in the original draft surfaced when reading analysis-tab.tsx end to end: - Empty states: the tab has two distinct empty states. The first shows when aiFlakyInvestigationEnabled is DISABLED at the repo level (no Analyze button, "Go to Settings" link instead). The second shows when the setting is enabled but no investigation has run yet. Source: analysis-tab.tsx:87-160. - Summary section: the tab renders the AI's prose summary of the investigation between the Latest Analysis header and Key Findings. Source: analysis-tab.tsx:295-299. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-07T07:57:56Z

Claude finished @samgutentag's task in 2m 30s —— View job

PR Review

Read repo conventions
Review flaky-tests/ai-investigation.md
Verify links resolve
Review summary.md changes
Post inline comments and final summary

Page is well-structured, follows the page format used elsewhere in flaky-tests/, and reads cleanly. All three previously raised blocking issues have been resolved on this branch:

Confidence score bands are now mutually exclusive (80% and above, 50% to 79%, below 50%).
The CI Logs fact type now reads as analyzing logs alongside test failure output rather than augmenting the output.
The broken get-root-cause-analysis.md link has been replaced with fix-flaky-test.md.

I verified the four relative links in the file all resolve:

use-mcp-server/README.md ✓
use-mcp-server/mcp-tool-reference/fix-flaky-test.md ✓
webhooks/README.md ✓
managing-detected-flaky-tests.md ✓

summary.md entry placement is appropriate (right after Test Collections, before Dashboard).

Non-blocking grammar/style nits (inline suggestions posted):

Line 12 — "the Analyze button" is unbolded while every other reference to UI buttons on this page uses **Bold**. Suggested making it consistent.
Line 25 — Add a comma before "along with a Go to Settings button" so the phrase clearly modifies "the tab shows" rather than "not enabled".
Line 36 — Confidence band parallelism: third clause drops "for" (green for X, yellow for Y, orange below Z → ... orange for below Z).

No blocking issues. Note that triage status (per Sam's comment) is still pending on the prod GA rollout — the docs themselves are ready, but the user-visibility gate is the separate flag rollout, not this review.
• Branch: sam-gutentag/flaky-tests-ai-investigation-tab

claude · 2026-05-07T08:00:43Z

+Trunk Flaky Tests can automatically investigate a flaky test and surface root cause findings directly in the Trunk web app. The **Analysis** tab on a test detail page shows the latest investigation results, lets you trigger a new investigation, apply a fix, or browse past investigations.
+
+{% hint style="info" %}
+The Analysis tab requires a GitHub app installation for the repository. If your repo does not have the Trunk GitHub app installed, the Analyze button will be disabled.


Style nit: elsewhere in this page UI labels are bolded (e.g. **Analyze** button on line 38, Analyze again on line 26 unbolded). For consistency with the bolded usages, consider:

Suggested change

The Analysis tab requires a GitHub app installation for the repository. If your repo does not have the Trunk GitHub app installed, the Analyze button will be disabled.

The Analysis tab requires a GitHub app installation for the repository. If your repo does not have the Trunk GitHub app installed, the **Analyze** button will be disabled.

claude · 2026-05-07T08:00:44Z

+
+The tab shows an empty state in two cases:
+
+- **AI investigation disabled for the repo**: If `aiFlakyInvestigationEnabled` is set to `DISABLED` in the repo's Flaky Tests settings, the tab shows a message that AI investigation is not enabled along with a **Go to Settings** button. No Analyze button is shown until the setting is enabled.


Minor readability nit: the "along with a button" phrase modifies what the tab shows, but as written it reads as if it's modifying "not enabled". A comma disambiguates:

Suggested change

- **AI investigation disabled for the repo**: If `aiFlakyInvestigationEnabled` is set to `DISABLED` in the repo's Flaky Tests settings, the tab shows a message that AI investigation is not enabled along with a **Go to Settings** button. No Analyze button is shown until the setting is enabled.

- **AI investigation disabled for the repo**: If `aiFlakyInvestigationEnabled` is set to `DISABLED` in the repo's Flaky Tests settings, the tab shows a message that AI investigation is not enabled, along with a **Go to Settings** button. No Analyze button is shown until the setting is enabled.

claude · 2026-05-07T08:00:45Z

+
+At the top, you will see:
+
+- The **overall confidence score** (color-coded green for 80% and above, yellow for 50% to 79%, orange below 50%)


Parallelism nit: the third clause drops "for" — "green for 80%…, yellow for 50%…, orange below 50%". Adding "for" keeps the three bands parallel:

Suggested change

- The **overall confidence score** (color-coded green for 80% and above, yellow for 50% to 79%, orange below 50%)

- The **overall confidence score** (color-coded green for 80% and above, yellow for 50% to 79%, orange for below 50%)

github-actions

Auto-approved: Claude code review passed.

claude Bot reviewed Apr 11, 2026

View reviewed changes