-
Notifications
You must be signed in to change notification settings - Fork 691
[GEODE-10508] Remedation of ANTLR nondeterminism warnings in OQL grammar #7942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This commit resolves four nondeterminism warnings generated by ANTLR during
the OQL grammar compilation process. These warnings indicated parser ambiguity
that could lead to unpredictable parsing behavior.
Problem Analysis:
-----------------
1. Lines 574 & 578 (projection rule):
The parser could not distinguish between aggregateExpr and expr alternatives
when encountering aggregate function keywords (sum, avg, min, max, count).
These keywords are valid both as:
- Aggregate function identifiers: sum(field)
- Regular identifiers in expressions: sum as a field name
Without lookahead, ANTLR could not deterministically choose which production
rule to apply, resulting in nondeterminism warnings.
2. Lines 961 & 979 (aggregateExpr rule):
Optional 'distinct' keyword created ambiguity in aggregate function parsing.
The parser could not decide whether to:
- Match the optional 'distinct' keyword, or
- Skip it and proceed directly to the expression
Both paths were valid, but ANTLR's default behavior doesn't specify
preference, causing nondeterminism.
Solution Implemented:
--------------------
1. Added syntactic predicates to projection rule (lines 574, 578):
Predicate: (('sum'|'avg'|'min'|'max'|'count') TOK_LPAREN)=>
This instructs the parser to look ahead and check if an aggregate keyword
is followed by a left parenthesis. If true, it chooses aggregateExpr;
otherwise, it chooses expr. This resolves the ambiguity by providing
explicit lookahead logic.
2. Added greedy option to aggregateExpr rule (lines 961, 979):
Option: options {greedy=true;}
This tells the parser to greedily match the 'distinct' keyword whenever
it appears, rather than being ambiguous about whether to match or skip.
The greedy option eliminates the nondeterminism by establishing clear
matching priority.
3. Updated test to use token constants (AbstractCompiledValueTestJUnitTest):
Changed: hardcoded value 89 -> OQLLexerTokenTypes.LITERAL_or
Rationale: Adding syntactic predicates changes ANTLR's token numbering
in the generated lexer (LITERAL_or shifted from 89 to 94). Using the
constant ensures test correctness regardless of future grammar changes.
This is a best practice for maintaining test stability.
Impact:
-------
- Zero nondeterminism warnings from ANTLR grammar generation
- No changes to OQL syntax or semantics (fully backward compatible)
- No runtime behavior changes (modifications only affect parser generation)
- All existing tests pass with updated token reference
- Improved parser determinism and maintainability
Technical Details:
-----------------
- Syntactic predicates (=>) are standard ANTLR 2 feature for lookahead
- Greedy option is standard ANTLR feature for optional subrule disambiguation
- Token constant usage follows best practices for generated code references
- Changes are compile-time only with no runtime performance impact
Files Modified:
--------------
- geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g
- geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java
Fix line length formatting for improved readability.
Contributor
Author
|
Hi @sboorlagadda , all checks have passed. Thank you. |
sboorlagadda
approved these changes
Dec 3, 2025
Member
sboorlagadda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Contributor
Author
|
Thank you so much for your review @sboorlagadda . |
JinwooHwang
added a commit
that referenced
this pull request
Dec 8, 2025
…mar (#7942) * GEODE-10508: Fix ANTLR nondeterminism warnings in OQL grammar This commit resolves four nondeterminism warnings generated by ANTLR during the OQL grammar compilation process. These warnings indicated parser ambiguity that could lead to unpredictable parsing behavior. Problem Analysis: ----------------- 1. Lines 574 & 578 (projection rule): The parser could not distinguish between aggregateExpr and expr alternatives when encountering aggregate function keywords (sum, avg, min, max, count). These keywords are valid both as: - Aggregate function identifiers: sum(field) - Regular identifiers in expressions: sum as a field name Without lookahead, ANTLR could not deterministically choose which production rule to apply, resulting in nondeterminism warnings. 2. Lines 961 & 979 (aggregateExpr rule): Optional 'distinct' keyword created ambiguity in aggregate function parsing. The parser could not decide whether to: - Match the optional 'distinct' keyword, or - Skip it and proceed directly to the expression Both paths were valid, but ANTLR's default behavior doesn't specify preference, causing nondeterminism. Solution Implemented: -------------------- 1. Added syntactic predicates to projection rule (lines 574, 578): Predicate: (('sum'|'avg'|'min'|'max'|'count') TOK_LPAREN)=> This instructs the parser to look ahead and check if an aggregate keyword is followed by a left parenthesis. If true, it chooses aggregateExpr; otherwise, it chooses expr. This resolves the ambiguity by providing explicit lookahead logic. 2. Added greedy option to aggregateExpr rule (lines 961, 979): Option: options {greedy=true;} This tells the parser to greedily match the 'distinct' keyword whenever it appears, rather than being ambiguous about whether to match or skip. The greedy option eliminates the nondeterminism by establishing clear matching priority. 3. Updated test to use token constants (AbstractCompiledValueTestJUnitTest): Changed: hardcoded value 89 -> OQLLexerTokenTypes.LITERAL_or Rationale: Adding syntactic predicates changes ANTLR's token numbering in the generated lexer (LITERAL_or shifted from 89 to 94). Using the constant ensures test correctness regardless of future grammar changes. This is a best practice for maintaining test stability. Impact: ------- - Zero nondeterminism warnings from ANTLR grammar generation - No changes to OQL syntax or semantics (fully backward compatible) - No runtime behavior changes (modifications only affect parser generation) - All existing tests pass with updated token reference - Improved parser determinism and maintainability Technical Details: ----------------- - Syntactic predicates (=>) are standard ANTLR 2 feature for lookahead - Greedy option is standard ANTLR feature for optional subrule disambiguation - Token constant usage follows best practices for generated code references - Changes are compile-time only with no runtime performance impact Files Modified: -------------- - geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.g - geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.java * GEODE-10508: Apply code formatting to test file Fix line length formatting for improved readability.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR resolves four nondeterminism warnings generated by ANTLR during the OQL (Object Query Language) grammar compilation process. These warnings indicated parser ambiguity that could lead to unpredictable parsing behavior.
Issue
Fixes GEODE-10508
Problem Description
During the
generateGrammarSourcetask, ANTLR produced the following warnings:Root Cause
Lines 574 & 578 (projection rule):
aggregateExprandexpralternatives when encountering aggregate function keywords (sum,avg,min,max,count)sum(field)sumas a field nameLines 961 & 979 (aggregateExpr rule):
distinctkeyword created ambiguity in aggregate function parsingdistinctkeyword or skip it and proceed directly to the expressionSolution
1. Added Syntactic Predicates (Lines 574 & 578)
Added lookahead predicates to the
projectionrule:Reasoning:
The predicate
(("sum"|"avg"|"min"|"max"|"count") TOK_LPAREN)=>instructs the parser to look ahead and check if an aggregate keyword is followed by a left parenthesis. If true, it choosesaggregateExpr; otherwise, it choosesexpr. This provides explicit lookahead logic to resolve the ambiguity.2. Added Greedy Option (Lines 961 & 979)
Added
greedyoption for optionaldistinctkeywords:Reasoning:
The
greedyoption tells the parser to greedily match thedistinctkeyword whenever it appears, rather than being ambiguous about whether to match or skip. This establishes clear matching priority and eliminates nondeterminism.3. Updated Test to Use Token Constants
Modified
AbstractCompiledValueTestJUnitTest.java:Reasoning:
Adding syntactic predicates changes ANTLR's token numbering in the generated lexer (
LITERAL_orshifted from 89 to 94). Using the constant ensures test correctness regardless of future grammar changes. This is a best practice for maintaining test stability.Changes Made
geode-core/src/main/antlr/org/apache/geode/cache/query/internal/parse/oql.gprojectionrule (2 locations)greedyoption toaggregateExprrule (2 locations)geode-core/src/test/java/org/apache/geode/cache/query/internal/AbstractCompiledValueTestJUnitTest.javaOQLLexerTokenTypesTesting
Verification Steps
AbstractCompiledValueTestJUnitTestpassesTest Commands
Impact Assessment
Benefits
Technical Details
=>) are a standard ANTLR 2 feature for lookaheadChecklist
For all changes, please confirm:
develop)?gradlew buildrun cleanly?