Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a00a5b2
Add support for an MCP to analyse and prioritise PRs
Steboss Oct 29, 2025
45f9ab2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 29, 2025
7b55fb0
Update mcp_server/requirements.txt
Steboss Oct 30, 2025
49c5826
start addressing @IvanYashchuk comments and feedback
Steboss Nov 7, 2025
5ecfbc9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
f136027
update to have cursor to auto recognize prompt
Steboss Nov 7, 2025
f7a4f3b
Merge branch 'main' of github.com:Steboss/lightning-thunder
Steboss Nov 7, 2025
097416a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
146cd8b
update the code with the definition of done
Steboss Nov 7, 2025
f4a007a
update the code to be aligned with q-priority
Steboss Nov 7, 2025
8f46409
Merge branch 'main' of github.com:Steboss/lightning-thunder
Steboss Nov 7, 2025
86e7e67
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
f50e39a
fix little error
Steboss Nov 7, 2025
a4ed96a
Merge branch 'main' of github.com:Steboss/lightning-thunder
Steboss Nov 7, 2025
6c42d61
fix issues in the code. at the moment the code is ugly, but we will i…
Steboss Nov 7, 2025
eb0b639
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
957e695
fix an import error
Steboss Nov 7, 2025
5b951f1
Merge branch 'main' of github.com:Steboss/lightning-thunder
Steboss Nov 7, 2025
2f3e5b8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
333c3ca
we are mostly aligned to q-plannings
Steboss Nov 7, 2025
baf2ac4
we are mostly aligned to q-plannings
Steboss Nov 7, 2025
a2c6f98
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 7, 2025
497675b
fix plots and functions
Steboss Nov 10, 2025
07dc55d
Merge branch 'main' of github.com:Steboss/lightning-thunder
Steboss Nov 10, 2025
a6f6a0f
fix format
Steboss Nov 10, 2025
3a5fe35
try to make the scoring easier
Steboss Nov 10, 2025
10b60f0
remove this md
Steboss Nov 10, 2025
c6f9120
update documentation
Steboss Nov 10, 2025
895ea2b
remove the priority matrix document
Steboss Nov 10, 2025
fbe5efa
fix prompt for analysing all the PRs
Steboss Nov 10, 2025
e65a055
fix error in plot
Steboss Nov 10, 2025
0480623
improve UX on dashboard
Steboss Nov 13, 2025
1a6c3c3
skip verification
Steboss Nov 13, 2025
c451453
remove dashboards
Steboss Nov 13, 2025
eb5f584
modify js code
Steboss Nov 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,6 @@ notebooks/all.txt
notebooks/ci.txt

quickstart_report.json

# MCP server generated dashboards
mcp_server/dashboards/
216 changes: 216 additions & 0 deletions mcp_server/ARCHITECTURE.md

Large diffs are not rendered by default.

374 changes: 374 additions & 0 deletions mcp_server/HEURISTIC_LLM_INTEGRATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,374 @@
# Heuristic + LLM Integration

This readme explains how the PR scoring works

## Overview

The PR review system integrates **heuristic analysis** with **LLM reasoning** to provide smart, context-aware guidance tailored to each PR's complexity and impact. The system includes comprehensive checks for:

- code quality,
- strategic alignment,
- team review status,
- and readiness for merge.

## How It Works

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ PR Analysis Flow β”‚
β”‚ β”‚
β”‚ 1. GitHub Data β†’ PR details, diff, reviews, CI β”‚
β”‚ β”‚
β”‚ 2. Heuristic Analysis β†’ Complexity (0-10) β”‚
β”‚ Impact (0-10) β”‚
β”‚ Risk Score (0-10) β”‚
β”‚ Definition of Ready β”‚
β”‚ Internal Review Status β”‚
β”‚ Strategic Goal Alignment β”‚
β”‚ β”‚
β”‚ 3. Pass to LLM β†’ LLM sees all heuristic scores β”‚
β”‚ + Original PR data |
β”‚ + Google Drive context (optional) β”‚
β”‚ β”‚
β”‚ 4. LLM Analysis β†’ For SIMPLE PRs: β”‚
β”‚ - Quick review checklist β”‚
β”‚ - Fast assessment β”‚
β”‚ |
β”‚ For COMPLEX PRs: β”‚
β”‚ - Detailed review strategy β”‚
β”‚ - Debug checklist β”‚
β”‚ - What could go wrong? β”‚
β”‚ - Testing recommendations β”‚
β”‚ β”‚
β”‚ 5. Combined Result β†’ Heuristic scores + LLM insights β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Heuristic Scoring System

### Priority Score (0-100)

The priority score is calculated from **4 weighted components**:

```
Priority = Base Score (50%) + Strategic (30%) + Review (15%) + Staleness (5%)
Maximum: 50 + 30 + 15 + 5 = 100
```

#### 1. Base Score (0-50 points)

Based on the **Complexity Γ— Impact Matrix**:

| Complexity | Impact | Base Score | Category |
| ------------ | --------- | ---------- | ---------------------------------------------------- |
| Simple (≀4) | High (β‰₯7) | **50** | πŸ”₯ **CRITICAL** - High impact, easy to review |
| Simple (≀4) | Low (\<7) | **40** | ⚑ **QUICK WIN** - Easy to review and merge |
| Complex (>4) | High (β‰₯7) | **35** | 🎯 **IMPORTANT** - High impact, needs careful review |
| Complex (>4) | Low (\<7) | **20** | πŸ“ **LOW PRIORITY** - Complex with low impact |

**Rationale:**

- **Simple + High Impact** gets highest priority (50) β†’ Critical fixes that are easy to review
- **Simple + Low Impact** gets second priority (40) β†’ Quick wins to clear the backlog
- **Complex + High Impact** gets medium-high (35) β†’ Important but requires time
- **Complex + Low Impact** gets lowest (20) β†’ Can be deferred

#### 2. Strategic Score (0-30 points)

Aligns PRs with **Q4 strategic goals**:

- **P0 (Must-Have):** +30 points πŸ”₯
- **P1 (Should-Have):** +20 points 🎯
- **P2 (Nice-to-Have):** +10 points πŸ“Œ
- **Not Aligned:** 0 points

**How it works:**

- PRs must link to a GitHub issue (e.g., `Closes #123`)
- Issues are associated with strategic goals via `link_issue_to_goal()`
- The highest priority goal is used for scoring

**Example:**

```python
# PR links to Issue #456
# Issue #456 is linked to "Q4-inference-opt" (P0 goal)
# β†’ PR gets +30 strategic points
```

#### 3. Review Score (0-15 points)

Tracks internal and external review progress:

**Bonuses:**

- **+15 pts:** Ready for external review (2+ Thunder team approvals) βœ…
- **+10 pts:** Approved and mergeable (alternative path)

**Penalties:**

- **-5 pts:** Changes requested
- **-10 pts:** Has merge conflicts

**Capped at:** 0-15 points

#### 4. Staleness Score (0-5 points)

Encourages merging old PRs, especially simple ones:

**Simple PRs (complexity ≀ 4):**

- **>90 days old:** +5 pts
- **>60 days old:** +3 pts
- **>30 days old:** +2 pts

**Complex PRs (complexity > 4):**

- **>90 days old:** +2 pts
- **>60 days old:** +1 pt
- **No activity in 30+ days:** -2 pts (likely stale)

**Capped at:** 0-5 points

### Complexity Score (0-10)

Measures how hard the PR is to review:

**Scoring factors:**

- **File count:**
- \>20 files: +3
- \>10 files: +2
- \>5 files: +1
- **Lines changed:**
- \>1000 lines: +3
- \>500 lines: +2
- \>100 lines: +1
- **Simple indicators:** "fix typo", "update doc", "formatting" β†’ -2
- **Complex indicators:** "refactor", "architecture", "redesign" β†’ +2
- **Core files:** >3 core/base/engine files β†’ +2

**Categories:**

- **0-3:** 🟒 Simple (formatting, docs, small fixes)
- **4-6:** 🟑 Moderate (features, refactoring)
- **7-10:** πŸ”΄ Complex (architecture changes, large refactors)

### Impact Score (0-10)

Measures the PR's importance to the project:

**Starting point:** 5 (medium impact)

**Adjustments:**

- **Security:** High (β‰₯7) β†’ +3, Medium (β‰₯4) β†’ +1
- **Urgency:** High (β‰₯7) β†’ +2, Medium (β‰₯4) β†’ +1
- **Breaking changes:** High (β‰₯7) β†’ +2
- **High-impact labels:** "critical", "blocker", "bug", "security", "performance" β†’ +2
- **Low-impact labels:** "documentation", "style", "chore" β†’ -2
- **Approved & mergeable:** +1

**Capped at:** 0-10

### Risk Score (0-10)

Multi-dimensional risk assessment:

#### Breaking Changes Risk (0-10)

- Keywords: "breaking", "deprecat", "remov", "incompatib" β†’ +5
- API/interface files modified β†’ +2
- \>20 files changed β†’ +2
- "breaking" label β†’ +5

#### Security Risk (0-10)

- Keywords: "security", "vulnerability", "auth", "credential" β†’ +7
- Security-related files β†’ +3
- "security" label β†’ +8

#### Urgency Risk (0-10)

- Keywords: "block", "critical", "urgent", "hotfix", "bug" β†’ +4
- \>90 days old β†’ +3
- Critical labels β†’ +5
- \>15 comments β†’ +2 (high engagement)

**Overall risk:** Average of the three components

### Definition of Ready

Checks whether a PR meets the **Thunder Team PR Guidelines**:

#### Checklist (5 checks):

1. **βœ… Descriptive Title**

- Length β‰₯ 20 characters
- Starts with capital letter
- Example: "Add support for fused Adam optimizer"

1. **βœ… Comprehensive Body**

- Length β‰₯ 100 characters
- Self-contained (no "per the title")
- Explains what & why

1. **βœ… Linked Issue**

- Contains "closes #123", "fixes #456", or "resolves #789"
- Exception: Small fixes (typos, formatting) can skip this

1. **βœ… CI Passing**

- All CI checks completed successfully
- No failing tests

1. **βœ… Not Draft**

- PR is not marked as draft

**Scoring:**

- **Readiness Score:** (checks_passed / 5) Γ— 100
- **80-100:** Ready with minor issues
- **60-79:** Needs attention
- **0-59:** Not ready

### Internal Review Status

Tracks **Thunder team** review progress per PR guidelines:

**Thunder Team Members:**

- crcrpar, kshitij12345, kiya00, riccardofelluga, beverlylytle, mattteochen, shino16

**Review Requirements:**

- **2 team approvals** required before pinging external maintainers
- **0 change requests** outstanding

**Status Messages:**

- βœ… "Ready - Can ping external maintainers (@lantiga, @t-vi, @KaelanDt)"
- ⏳ "Needs 1 more team approval (has 1/2)"
- ⏳ "Needs 2 team approvals (has 0/2)"
- πŸ”„ "Has X change request(s) from team - needs resolution"

**Impact on Priority:**

- Ready for external review β†’ **+15 priority points**
- Changes requested β†’ **-5 priority points**

### Strategic Goal Alignment

Links PRs to **Q4 strategic goals**:

**Goal Priorities:**

- **P0 (Must-Have):** Critical Q4 deliverables
- **P1 (Should-Have):** Important but not blocking
- **P2 (Nice-to-Have):** Future improvements

**How it works:**

1. Create strategic goals:

```python
add_strategic_goal(
goal_id="Q4-inference-opt",
title="Inference Optimization",
priority="P0",
description="Optimize inference performance",
theme="Performance",
)
```

1. Link issues to goals:

```python
link_issue_to_goal(issue_number=456, goal_id="Q4-inference-opt")
```

1. PRs linking to those issues get alignment score:

- **P0:** +30 priority points πŸ”₯
- **P1:** +20 priority points 🎯
- **P2:** +10 priority points πŸ“Œ

**Benefit:** Ensures Q4 goals are prioritized in PR reviews!

## LLM Integration

### What the LLM Sees

The LLM receives a comprehensive prompt including:

```markdown
## HEURISTIC ANALYSIS

Our automated system has analyzed this PR:

**Complexity Score:** 8/10 (COMPLEX)
**Impact Score:** 9/10 (HIGH IMPACT)
**Priority Score:** 75/100

**Priority Reasoning:**
🎯 IMPORTANT (Complex + High Impact)
Complexity 8/10: 25 files changed, 2000 lines changed, refactors core files
Impact 9/10: high urgency, performance label, high-impact label
Strategic: πŸ”₯ P0 STRATEGIC GOAL (closes #456) (+30pts)
Review: βœ… ready for external review (2+ team approvals) (+15pts)
Staleness: aging 65 days (+2pts)
Final: 82/100 (Base:35 + Strategic:30 + Review:15 + Staleness:2)

**Review Guidance:** This is a complex change. Pay special attention to
architecture, testing, and potential side effects.

---

## DEFINITION OF READY
**Readiness Score:** 80/100
**Status:** ⚠️ Not ready - 1 check(s) failing

**Failing Checks:**
- ❌ CI checks failing: test_distributed.yaml

---

## INTERNAL REVIEW STATUS
βœ… Ready - Can ping external maintainers (@lantiga, @t-vi, @KaelanDt)
- Team Approvals: 2/2
- Reviewers: kshitij12345, mattteochen

---

## STRATEGIC GOAL ALIGNMENT
**Aligned:** Yes πŸ”₯
**Priority:** P0 (Must-Have)
**Goal:** Q4 Inference Optimization
**Closes:** #456
```

### LLM Response Sections

#### For SIMPLE PRs (complexity < 7):

**2 Sections:**

1. **Summary** - What and why
1. **Risk Assessment** - Breaking changes, security, urgency

#### For COMPLEX PRs (complexity β‰₯ 7):

**3 Sections:**

1. **Summary** - What and why
1. **Risk Assessment** - Breaking changes, security, urgency
1. **Review Checklist & Debugging Guide** ✨
- Key areas to review
- Potential issues
- Testing strategy
- Architecture impact
- Debug checklist
Loading