Skip to content

Conversation

@isidromontero
Copy link

Problem Description

Repository scans fail completely with a FATAL error when a Pull Request is authored by a bot user (such as GitHub Copilot) whose user metadata is not accessible via the GitHub API.

Error Details

When scanning a repository with PRs from bot users, the scan aborts with:

java.io.FileNotFoundException: https://api.github.com/users/Copilot {"message":"Not Found","documentation_url":"https://docs.github.com/rest/users/users#get-a-user","status":"404"}
    at org.kohsuke.github.GitHubResponse.wrapException(GitHubResponse.java:101)
    ...

Impact

  • Entire repository scan fails with FATAL error status
  • Orphaned item cleanup strategy does not execute because the scan never completes successfully
  • This leads to hundreds of orphaned PR branches accumulating and consuming disk space
  • Real-world case: 1,757 PR branches consuming 360GB+ of disk space

Root Cause

In GitHubSCMSource.java, the CacheUpdatingIterable.observe() method (line 2596) throws a WrappedException when user metadata cannot be fetched:

} catch (FileNotFoundException e) {
    request.listener()
            .getLogger()
            .format(
                    "%n  Could not find user %s for pull request %d.%n",
                    user == null ? "null" : user.getLogin(), number);
    throw new WrappedException(e);  // FATAL - aborts entire scan
}

This behavior is problematic because:

  1. Bot users like GitHub Copilot don't have accessible user metadata via /users/Copilot endpoint (returns 404)
  2. This is expected behavior, not an exceptional error that should abort the scan
  3. The scan should be resilient to missing metadata and continue processing other PRs

Changes Made

This PR implements tolerant error handling that uses default values when user metadata is unavailable:

Key Changes

  1. Graceful degradation for user metadata:

    • Try to fetch user.getName() and user.getEmail()
    • On FileNotFoundException (404), use default values: login as name, [email protected] as email
    • Log warning for visibility without aborting scan
  2. Remove unnecessary try-catch:

    • Removed redundant IOException catch for PR metadata methods (getTitle(), getBody(), getHtmlUrl()) which don't throw checked exceptions
  3. Maintain backward compatibility:

    • All existing successful paths remain unchanged
    • Only affects error handling for unavailable metadata

Code Example

// Try to get user metadata, but be tolerant to API failures
String userName = login;
String userEmail = login + "@users.noreply.github.com";
try {
    userName = user.getName();
    if (userName == null || userName.isEmpty()) {
        userName = login;
    }
    userEmail = user.getEmail();
    if (userEmail == null || userEmail.isEmpty()) {
        userEmail = login + "@users.noreply.github.com";
    }
} catch (FileNotFoundException e) {
    // User metadata not accessible (e.g., bot users like Copilot)
    // Log warning but continue with default values
    request.listener()
            .getLogger()
            .format(
                    "%n  Could not find user metadata for %s (PR #%d). Using default values.%n",
                    login, number);
} catch (IOException e) {
    // Other IO errors, use default values but log
    request.listener()
            .getLogger()
            .format(
                    "%n  IO error fetching user metadata for %s (PR #%d): %s. Using default values.%n",
                    login, number, e.getMessage());
}

Benefits

  • ✅ Repository scans complete successfully even with bot-authored PRs
  • ✅ Orphaned item cleanup strategy executes as configured
  • ✅ Bot PRs are indexed with sensible default values
  • ✅ Warnings are logged for visibility without blocking the entire scan
  • ✅ Graceful degradation instead of catastrophic failure

Testing

Tested locally by:

  1. Building custom plugin with these changes
  2. Installing on Jenkins instance with 1,757 PR branches (including Copilot-authored PRs)
  3. Confirming scans complete successfully with warnings instead of fatal errors
  4. Verifying orphaned item strategy executes properly

Checklist

  • Code compiles successfully (mvn clean package -DskipTests)
  • Changes are backward compatible
  • Error messages are informative
  • Real-world testing completed
  • Commit message follows conventions

Note: This fix addresses a critical production issue where bot-authored PRs prevent repository indexing and orphaned item cleanup, leading to significant disk space consumption.

Repository scans were failing with FileNotFoundException when PRs
were authored by bot users (e.g., GitHub Copilot) whose metadata
is not accessible via the GitHub API.

Problem:
- Scan aborts with FATAL error when user metadata fetch returns 404
- This prevents orphanedItemStrategy from executing
- Results in accumulation of orphaned PR branches

Solution:
- Implement tolerant error handling for user metadata fetching
- Use default values (login, [email protected]) when
  metadata is unavailable
- Log warnings instead of throwing exceptions
- Allow scan to complete successfully

This ensures bot-authored PRs don't block repository indexing and
orphaned item cleanup can proceed as configured.

Fixes scan failures with bot users like GitHub Copilot.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@isidromontero isidromontero requested a review from a team as a code owner November 26, 2025 08:13
isidromontero and others added 2 commits November 26, 2025 09:27
The fix for handling bot users whose metadata is unavailable changed
the behavior from throwing exceptions to gracefully processing PRs with
default values. Updated tests to reflect this new, correct behavior:

- fetchSmokes_badUser: PR-2 is now included in results with default values
- testOpenSinglePRThrowsFileNotFoundOnObserve: Successfully processes PR instead of throwing
- testOpenSinglePRThrowsIOOnObserve: Successfully processes PR instead of throwing

This ensures PRs from bots (like GitHub Copilot) are processed instead
of causing repository scan failures.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant