Skip to content

Conversation

jriv01
Copy link
Contributor

@jriv01 jriv01 commented Sep 23, 2025

This changes adds additional fields for tracking whether a commit to llvm-project is a revert and what commit/pull request it is reverting.

Such commits are detecting by pattern matching against the commit message.

Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plumbing looks good, but I have some questions about data fidelity.


# Check which pull request or commit is being reverted (if any)
pull_request_match = re.search(
r"Reverts? (?:llvm\/llvm-project)?#(\d+)", commit.message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only works for PR reverts created through the UI.

Eg 4f33d7b7a9f39d733b7572f9afbf178bca8da127 does not have this despite reverting a PR.

Copy link
Contributor Author

@jriv01 jriv01 Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For examples like 4f33d7b7a9f39d733b7572f9afbf178bca8da127, I think it's fine. It would still capture the sha of the reverted commit from the message.

We could work backwards with some SQL and determine which PR was reverted from the data we're already maintaining since we map commit_sha -> PR. Although I could add that logic right into this script somewhere to make that clear.

The only issue I've (infrequently) encountered so far is commits with custom messages that don't match any of these patterns, since this pattern matching is relatively rigid. So these matches are "best effort" for the time being.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the developer did not capitalize "Revert"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the developer did not capitalize "Revert"?

It won't get captured. But the standard ways of doing reverts (through the cli with git revert and through the Github UI) both capitalize it. If we can properly capture both of those cases, I think we should be fine.

We could work backwards with some SQL and determine which PR was reverted from the data we're already maintaining since we map commit_sha -> PR. Although I could add that logic right into this script somewhere to make that clear.

Doing it later in SQL sounds good to me. Can you update the descriptions of the fields in the schema JSON to say that you can't rely on the presence of these fields to determine if it was a PR or commit that got reverted?

Copy link
Contributor Author

@jriv01 jriv01 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

What happens if the developer did not capitalize "Revert"?

I've updated the pattern matching to be case-insensitive just to get ahead of such commit messages.

pull_request_match = re.search(
r"Reverts? (?:llvm\/llvm-project)?#(\d+)", commit.message
)
commit_match = re.search(r"This reverts commit (\w+)", commit.message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to match the commit SHA from reverts for both PRs and direct commits or just the latter? It matches the former too if someone manually runs git revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to capture reverts from both PRs and direct commits

@cmtice cmtice requested a review from lnihlen September 23, 2025 23:11
Copy link
Contributor

@boomanaiden154 boomanaiden154 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting to the point where we should really have some unit testing with reasonable mocks.

A lot of the discussion here would be best encoded in a handful of unit tests and that will ensure future functionality doesn't regress. Something like https://github.com/llvm/llvm-project/blob/main/.ci/metrics/metrics_test.py I think should be a good baseline. We don't have anything yet to run them in CI for zorg, but that's not difficult to get running.

@jriv01
Copy link
Contributor Author

jriv01 commented Sep 24, 2025

This is getting to the point where we should really have some unit testing with reasonable mocks.

I can follow up with another PR setting up some unit tests.

@jriv01 jriv01 merged commit 3e516f7 into llvm:main Sep 24, 2025
5 checks passed
vvereschaka pushed a commit to vvereschaka/llvm-zorg that referenced this pull request Sep 25, 2025
This changes adds additional fields for tracking whether a commit to
llvm-project is a revert and what commit/pull request it is reverting.

Such commits are detecting by pattern matching against the commit
message.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants