Skip to content

Conversation

@asifabashar
Copy link
Contributor

@asifabashar asifabashar commented Nov 7, 2025

Description

Is your feature request related to a problem?
addtotals command to show total of all columns of each row as a new column , and also have option to show total of all rows of each column values to show at the end of rows.
Fixes issue #4607
From roadmap #4287

addcoltotals command to show total of each column's all rows values to show at the end of rows.
From roadmap #4287

What solution would you like?
command: addtotals ,addcoltotals
addtotals: Add totals across rows by default and also calculate total across columns when col=true
The addtotals command adds together the numeric fields in each search result.

You may specify which fields to include rather than summing all numeric fields.
The final total is stored in a new field.

The addtotals command's behavior is as follows:

When col=true, it computes the sum for every column and adds a summary row at the end containing those totals.

To label this final summary row, specify a labelfield and assign it a value using the label option.

Alternatively, instead of using the addtotals col=true command, you can use the addcoltotals command to calculate a summary event.

labelfield, if specified, is a field that will be added at the last row of the column specified by labalfield with the value set by the 'label' option.

Command Syntax:
addtotals [row=<bool>] [col=<bool>] [labelfield=<field>] [label=<string>] [fieldname=<field>] [<field-list>]
arguments description:
row: Syntax: row=<bool> . Indicates whether to compute the sum of the for each event. This works like generating a total for each row in a table. The result is stored in a new field, which is named Total by default. To use a different field name, provide the fieldname argument. Default value is true.

col : Syntax: col=<bool> . Indicates whether to append a new event—called a summary event—to the end of the event list. This summary event shows the total for each field across all events, similar to calculating column totals in a table. Default is false.

fieldname : Syntax: fieldname=<field> . Specifies the name of the field that stores the calculated sum of the field-list for each event. This argument is only applicable when row=true. Default is Total

field-list : Syntax: <field> ... . One or more numeric fields separated by spaces. Only the fields listed in the are included in the sum. If no is provided, all numeric fields are summed by default.

labelfield : Syntax: labelfield=<field> . Specifies a field to use as the label for the summary event. This argument is only applicable when col=true."

To use an existing field from your result set, provide its name as the value for the labelfield argument. For example, if the field is named salary, specify labelfield=salary. If no existing field matches the labelfield value, a new field is created using that value.

label: Syntax: label=<string>. Specifies a row label for the summary event.

If the labelfield argument refers to an existing field in the result set, the label value appears in that field for the summary row.

If the labelfield argument creates a new field, the label is placed in that new field in the summary event row. Default label is Total.

command addcoltotals: Add totals across columns of each row to show total in a new field.

addcoltotals: options
Optional Arguments
<field-list>
Syntax: <field> ... . A space-delimited list of valid field names. addcoltotals calculates sums only for the fields you include in this list. By default, the command calculates the sum for all fields.

labelfield: Syntax: labelfield=<fieldname>. Field name to add to the result set.

label : Syntax: label=<string> . Used together with the labelfield argument to add a label to the summary event. If labelfield is not specified, the label argument has no effect. Default label is Total.

Related Issues

Resolves #4607 [#4607 ]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (5)
docs/user/ppl/cmd/addtotals.rst (1)

14-27: Clarify default behavior and fix minor wording/spacing issues

The first description sentence implies that addtotals always “appends a row with the totals”, but from the implementation/tests the summary row is only added when col=true; with the defaults (row=true, col=false) it just adds a per‑row total field. Consider rephrasing to explicitly distinguish row totals vs the optional summary event so users aren’t confused about what happens by default.

Also, a few small text nits:

  • Line 25: “If it specifies” has a double space; make it “If it specifies”.
  • Lines 23–24, 27: “and add a new field” → “and adds a new field” for grammatical agreement.
  • Line 71: Start the sentence with “If row=true, …” for consistency.

Also applies to: 71-71

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java (3)

19-27: Add JavaDoc for this public integration test class

Given the guidelines that public classes should have JavaDoc, consider adding a brief comment describing that this class runs integration tests for the PPL addtotals command with Calcite enabled (and that init() enables Calcite and loads ACCOUNT/BANK indices).

For example:

/**
 * Integration tests for PPL {@code addtotals} with the Calcite engine.
 * Verifies row totals, optional column totals, custom labels, and interaction
 * with other PPL commands on ACCOUNT and BANK indices.
 */
public class CalciteAddTotalsCommandIT extends PPLIntegTestCase {

104-106: Limit helper visibility and remove fully qualified isNumeric calls

isNumeric, compareDataRowTotals, and verifyColTotals are only used inside this class, so they can all be private, and the fully qualified reference to CalciteAddTotalsCommandIT.isNumeric in testAddTotalsRowFieldsNonNumeric is unnecessary.

Suggested cleanup:

-  public static boolean isNumeric(String str) {
+  private static boolean isNumeric(String str) {
     return str != null && str.matches("-?\\d+(\\.\\d+)?");
   }
@@
-  private void verifyColTotals(
+  private void verifyColTotals(
       org.json.JSONArray dataRows, List<Integer> field_indexes, String finalSummaryEventLevel) {
@@
-        } else if (value instanceof String) {
-          if (org.opensearch.sql.calcite.remote.CalciteAddTotalsCommandIT.isNumeric(
-              (String) value)) {
+        } else if (value instanceof String) {
+          if (isNumeric((String) value)) {
             cRowTotal = cRowTotal.add(new BigDecimal((String) (value)));
           }
         }

and similarly in testAddTotalsRowFieldsNonNumeric:

-        } else if (value instanceof String) {
-          if (org.opensearch.sql.calcite.remote.CalciteAddTotalsCommandIT.isNumeric(
-              (String) value)) {
+        } else if (value instanceof String) {
+          if (isNumeric((String) value)) {
             cRowTotal = cRowTotal.add(new BigDecimal((String) (value)));
           }
         }

Also applies to: 136-173, 198-212


248-259: Align testAddTotalsWithNoData comment with actual assertion

The comment says:

// Should still have totals row even with no input data

but the test asserts:

assertEquals(0, dataRows.length()); // Only totals row

which currently expects zero rows. To avoid confusion, either change the assertion to expect a single totals row (and validate it), or update the comment to match the behavior you actually want to guarantee (e.g., “No rows expected when there is no input data”).

ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java (1)

343-355: Uncomment or remove unused expectedLogical / verifyLogical

In testAddTotalsWithAllOptionsIncludingFieldname, expectedLogical is built but verifyLogical(root, expectedLogical) is commented out. This leaves the logical plan unvalidated while still maintaining the expected string.

Either:

  • Uncomment verifyLogical(root, expectedLogical);, or
  • Remove expectedLogical if there’s a known reason not to assert the logical plan (and document that reason with a TODO).
🧹 Nitpick comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java (1)

13-17: Add class-level JavaDoc for this public test class

Per project guidelines, public classes should have JavaDoc. Adding a brief class comment describing that this verifies Calcite translation for addtotals (logical plan, results, Spark SQL) will make the intent clearer to future readers.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java (1)

68-70: Tighten helper visibility and simplify isNumeric usage

isNumeric and verifyColTotals are only used inside this test class, so they don’t need to be public. Making them private (and static for isNumeric) will better reflect their scope. Also, the fully qualified reference CalciteAddColTotalsCommandIT.isNumeric(...) inside verifyColTotals can be simplified to a direct isNumeric(...) call for readability.

For example:

-  public static boolean isNumeric(String str) {
+  private static boolean isNumeric(String str) {
     return str != null && str.matches("-?\\d+(\\.\\d+)?");
   }

-  public void verifyColTotals(
+  private void verifyColTotals(
       org.json.JSONArray dataRows, List<Integer> field_indexes, String finalSummaryEventLevel) {
@@
-        } else if (value instanceof String) {
-          if (org.opensearch.sql.calcite.remote.CalciteAddColTotalsCommandIT.isNumeric(
-              (String) value)) {
+        } else if (value instanceof String) {
+          if (isNumeric((String) value)) {
             cColTotals[j] = cColTotals[j].add(new BigDecimal((String) (value)));
           }
         }

Also applies to: 72-100

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c4a65a and 754f477.

📒 Files selected for processing (8)
  • docs/user/ppl/cmd/addcoltotals.rst (1 hunks)
  • docs/user/ppl/cmd/addtotals.rst (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java (1 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java (1 hunks)
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/user/ppl/cmd/addcoltotals.rst
  • ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
integ-test/**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

End-to-end scenarios need integration tests in integ-test/ module

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java

⚙️ CodeRabbit configuration file

integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)

  • Check tests can be run with ./gradlew :integ-test:integTest
  • Ensure proper test data setup and teardown
  • Validate end-to-end scenario coverage

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
**/*IT.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

Name integration tests with *IT.java suffix in OpenSearch SQL

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
🧠 Learnings (6)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddTotalsTest.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to integ-test/**/*IT.java : End-to-end scenarios need integration tests in `integ-test/` module

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : All new business logic requires unit tests

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : All public classes and methods must have proper JavaDoc

Applied to files:

  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddTotalsCommandIT.java
🧬 Code graph analysis (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteAddColTotalsCommandIT.java (1)
integ-test/src/test/java/org/opensearch/sql/ppl/PPLIntegTestCase.java (1)
  • PPLIntegTestCase (36-411)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/CalciteNoPushdownIT.java (1)

22-25: Suite composition update looks correct

Including CalciteAddTotalsCommandIT and CalciteAddColTotalsCommandIT in the Calcite no‑pushdown suite is consistent with existing naming and ensures the new commands participate in the standard integration suite.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (4)

13-17: Public test class lacks JavaDoc

Project guidelines call for JavaDoc on public classes/methods, even in tests. Consider adding a brief class‑level JavaDoc describing that this exercises Calcite translation and Spark SQL generation for addcoltotals.


19-59: testAddColTotals and testAddColTotalsAllFields appear to assert the same behavior

Both tests use the same PPL (fields DEPTNO, SAL, JOB | addcoltotals), and their expected logical plans, results, and Spark SQL are effectively identical. This looks like redundant coverage.

Either:

  • Collapse them into a single test, or
  • Adjust one to cover a distinct scenario (e.g., different field lists or upstream pipe operations) so the name matches the behavior.

Also applies to: 104-143


232-263: Label truncation from GrandTotalGrandTota is non‑obvious

These tests rely on the label 'GrandTotal' being stored in a VARCHAR(9) column (JOB), which truncates it to 'GrandTota'. The expectations are correct, but the mismatch between the label literal and the expected strings may confuse future readers.

Consider adding a short comment in one of these tests explaining that truncation is due to the underlying column length.

Also applies to: 275-306, 323-363


19-377: Consider adding scenarios with upstream filters/transformations (if not covered elsewhere)

All tests here start directly from source=EMP (with optional fields). To better guard Calcite integration, you might add a case where addcoltotals follows a where or other transforming command, to assert that column totals respect the transformed result set, not just the base table. Only needed if not already exercised in other ITs.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 754f477 and 2b956fc.

📒 Files selected for processing (1)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: test-sql-cli-integration (21)
🔇 Additional comments (1)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (1)

186-230: Coverage for label/labelfield combinations looks solid

The tests around label and labelfield (new field vs existing JOB, argument order variation, and full‑row aggregation) thoroughly check logical plans, result sets, and Spark SQL, including type/width handling and nulls. This gives good confidence in the Calcite wiring for addcoltotals.

Also applies to: 232-273, 275-316, 319-377

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (2)

232-273: Consider documenting the label truncation behavior.

The test correctly expects 'GrandTota' (line 242, 262) instead of the full 'GrandTotal' because the JOB field is VARCHAR(9) in the SCOTT schema. While this truncation is expected, adding a brief comment would improve readability and prevent confusion for future maintainers.

For example, add a comment before the test:

  // Tests label truncation when labelfield matches an existing field with shorter width.
  // JOB is VARCHAR(9), so 'GrandTotal' is truncated to 'GrandTota'.
  @Test
  public void testAddColTotalsMatchingLabelFieldWithExisting() throws IOException {

318-376: LGTM with minor style note.

This test correctly validates addcoltotals behavior when applied to all table fields with a labeled summary row. The logical plan properly aggregates all numeric columns.

Minor style note: Line 320 has extra spaces in the string concatenation (" labelfield='JOB' "), which is inconsistent but doesn't affect functionality.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b956fc and 284593f.

📒 Files selected for processing (1)
  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*.java: Use PascalCase for class names (e.g., QueryExecutor)
Use camelCase for method and variable names (e.g., executeQuery)
Use UPPER_SNAKE_CASE for constants (e.g., MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
Prefer Optional<T> for nullable returns in Java
Avoid unnecessary object creation in loops
Use StringBuilder for string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java

⚙️ CodeRabbit configuration file

**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)

  • Check for proper JavaDoc on public classes and methods
  • Flag redundant comments that restate obvious code
  • Ensure methods are under 20 lines with single responsibility
  • Verify proper error handling with specific exception types
  • Check for Optional usage instead of null returns
  • Validate proper use of try-with-resources for resource management

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/*Test.java

📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)

**/*Test.java: All new business logic requires unit tests
Name unit tests with *Test.java suffix in OpenSearch SQL

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/test/**/*.java

⚙️ CodeRabbit configuration file

**/test/**/*.java: - Verify test coverage for new business logic

  • Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
  • Ensure tests are independent and don't rely on execution order
  • Validate meaningful test data that reflects real-world scenarios
  • Check for proper cleanup of test resources

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/ppl/**/*.java

⚙️ CodeRabbit configuration file

**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases

  • Check AST generation for new syntax
  • Ensure corresponding AST builder classes are updated
  • Validate edge cases and boundary conditions

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
**/calcite/**/*.java

⚙️ CodeRabbit configuration file

**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor

  • Verify SQL generation and optimization paths
  • Document any Calcite-specific workarounds
  • Test compatibility with Calcite version constraints

Files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
🧠 Learnings (2)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code

Applied to files:

  • ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (6)
ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLAddColTotalsTest.java (6)

13-17: LGTM!

The test class is properly structured, extending the appropriate base class and initializing with the standard SCOTT schema.


19-59: LGTM!

This test properly validates the default addcoltotals behavior, verifying that it generates a UNION with aggregated column totals and produces the expected Calcite logical plan and Spark SQL translation.


61-102: LGTM!

This test correctly validates field-specific aggregation, ensuring that only the specified field (SAL) is summed while other fields remain null in the summary row.


145-184: LGTM!

This test validates multi-field aggregation syntax, ensuring that specifying multiple fields (DEPTNO SAL) produces the expected aggregate behavior in the logical plan and result set.


186-230: LGTM!

This test properly validates the label and labelfield options, ensuring that a new field is created to hold the label value in the summary row while remaining null in data rows.


275-316: LGTM!

This test validates that parameter order doesn't affect the addcoltotals behavior, ensuring parser flexibility. The label truncation behavior is consistent with the previous test.

LantaoJin
LantaoJin previously approved these changes Dec 9, 2025
Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, CI failure is not related.

@LantaoJin
Copy link
Member

cc @anasalkouz

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/category.json (1)

13-14: Reorder entries to maintain alphabetical sorting.

The new entries break the alphabetical order of the ppl_cli_calcite category. Both addtotals and addcoltotals should be positioned before append.md, and addcoltotals should precede addtotals alphabetically.

Apply this reordering to maintain consistency:

  "ppl_cli_calcite": [
    "user/ppl/cmd/ad.md",
+   "user/ppl/cmd/addcoltotals.md",
+   "user/ppl/cmd/addtotals.md",
    "user/ppl/cmd/append.md",
-   "user/ppl/cmd/addtotals.md",
-   "user/ppl/cmd/addcoltotals.md",
    "user/ppl/cmd/bin.md",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4becf12 and 5780870.

📒 Files selected for processing (5)
  • docs/category.json (1 hunks)
  • docs/user/ppl/cmd/addcoltotals.md (1 hunks)
  • docs/user/ppl/cmd/addtotals.md (1 hunks)
  • docs/user/ppl/index.md (2 hunks)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • docs/user/ppl/cmd/addcoltotals.md
  • docs/user/ppl/cmd/addtotals.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: build-linux (25, integration)
  • GitHub Check: build-linux (21, unit)
  • GitHub Check: build-linux (25, unit)
  • GitHub Check: bwc-tests-full-restart (25)
  • GitHub Check: bwc-tests-full-restart (21)
  • GitHub Check: build-linux (25, doc)
  • GitHub Check: bwc-tests-rolling-upgrade (25)
  • GitHub Check: build-linux (21, doc)
  • GitHub Check: build-linux (21, integration)
  • GitHub Check: bwc-tests-rolling-upgrade (21)
  • GitHub Check: security-it-linux (21)
  • GitHub Check: security-it-linux (25)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: test-sql-cli-integration (21)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
🔇 Additional comments (1)
docs/user/ppl/index.md (1)

81-82: Verify status consistency: new commands marked "stable" while other 3.4 features are "experimental".

The newly added addtotals and addcoltotals commands are marked as stable (since 3.4), while other 3.4-era commands in the same table (replace, chart, streamstats) are marked as experimental (since 3.4). This creates an inconsistency.

Per the PR objectives, companion deliverables (Javadoc, API-spec, and public documentation PRs) remain incomplete, which typically signals experimental status. Please clarify whether:

  1. These commands should be marked experimental (since 3.4) for consistency, or
  2. The "stable" designation is intentional based on completed test and documentation coverage, and other 3.4 features should eventually transition to stable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
docs/user/ppl/index.md (1)

81-82: Clarify command descriptions for consistency and precision.

The descriptions for both new commands are somewhat vague. Compare them with the PR objectives: addtotals computes per-event (row) totals and optionally per-column totals; addcoltotals computes per-row totals across specified columns. Consider more precise wording that better reflects the actual operations performed.

For example:

  • addtotals: "Compute and append row totals and optionally column totals."
  • addcoltotals: "Compute and append column totals as a new field."
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5780870 and 328dccc.

📒 Files selected for processing (1)
  • docs/user/ppl/index.md (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: For PPL command PRs, refer docs/dev/ppl-commands.md and verify the PR satisfies the checklist

Applied to files:

  • docs/user/ppl/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
  • GitHub Check: security-it-windows-macos (macos-14, 21)
  • GitHub Check: security-it-windows-macos (macos-14, 25)
  • GitHub Check: build-windows-macos (macos-14, 25, doc)
  • GitHub Check: security-it-windows-macos (windows-latest, 25)
  • GitHub Check: CodeQL-Scan (java)
  • GitHub Check: security-it-windows-macos (windows-latest, 21)
  • GitHub Check: build-windows-macos (macos-14, 21, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, integration)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
  • GitHub Check: build-windows-macos (macos-14, 25, unit)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (macos-14, 21, doc)
  • GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
  • GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
  • GitHub Check: Update draft release notes
  • GitHub Check: test-sql-cli-integration (21)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (34)
docs/user/ppl/cmd/lookup.md-20-28 (1)

20-28: Add language identifier to fenced code block.

The code block is missing a language specification. Per the markdown linting rule (MD040), all fenced code blocks must specify a language identifier for syntax highlighting and parsing.

Apply this diff to add the language identifier:

-```
+```bash
 source = table1 | lookup table2 id
 source = table1 | lookup table2 id, name
 source = table1 | lookup table2 id as cid, name
 source = table1 | lookup table2 id as cid, name replace dept as department
 source = table1 | lookup table2 id as cid, name replace dept as department, city as location
 source = table1 | lookup table2 id as cid, name append dept as department
 source = table1 | lookup table2 id as cid, name append dept as department, city as location
-```
+```
docs/user/ppl/cmd/lookup.md-34-42 (1)

34-42: Use standard markdown syntax for bash code fences.

Code fences use \``bash ignore, which is non-standard markdown. The ignorekeyword is not recognized by standard markdown parsers or CommonMark. Either use just```bash` or handle the "ignore" directive via project-specific tooling or comment syntax.

Apply this diff to use standard markdown syntax:

-```bash ignore
+```bash
 curl -H 'Content-Type: application/json' -X POST localhost:9200/_plugins/_ppl -d '{
-```
+```

Also applies to: 133-141, 147-155, 246-254

docs/user/ppl/cmd/join.md-63-63 (1)

63-63: Add language specification to fenced code blocks (MD040).

Lines 63 and 82 contain code blocks without a language specification. Based on the established pattern (e.g., line 97 uses \``ppl), these PPL syntax examples should specify ppl` as the language.

Apply this diff to add language specifications:

-```
+\`\`\`ppl
 source = table1 | inner join left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c

And similarly for line 82:

-```
+\`\`\`ppl
 source = table1 | join type=outer left = l right = r on l.a = r.a table2 | fields l.a, r.a, b, c

Also applies to: 82-82

docs/user/ppl/cmd/explain.md-87-87 (1)

87-87: Add language specifier to code block.

The fenced code block is missing a language identifier. Based on the JSON output shown, specify json as the language for proper syntax highlighting.

-```
+```json
docs/user/ppl/cmd/explain.md-26-26 (1)

26-26: Standardize section heading format.

Lines 58, 85, 109, and 134 use "Explain" while line 26 uses "Explain:" (with a colon). Standardize across all examples for consistency.

-Explain
+Explain:

Also applies to: 58-58, 85-85, 109-109, 134-134

docs/user/ppl/cmd/top.md-102-123 (1)

102-123: Fix duplicate example numbering.

The section starting at line 123 is labeled "Example 5" but should be "Example 6", as "Example 5" already appears at line 102.

-## Example 5: Specify the usenull field option  
+## Example 6: Specify the usenull field option  
docs/user/ppl/cmd/patterns.md-32-43 (1)

32-43: Specify a language identifier for the fenced code block.

The code block containing the cluster settings command should declare a language identifier for proper syntax highlighting.

Apply this diff to add the language identifier:

-```
+```json
  PUT _cluster/settings
  {
    "persistent": {
      "plugins.ppl.pattern.method": "brain",

[MD040]

docs/user/ppl/cmd/patterns.md-95-95 (1)

95-95: Use hyphens for compound adjectives.

The phrase "user defined patterns" should use a hyphen to form a compound adjective.

Apply this diff:

-This example shows how to extract patterns from a raw log field using user defined patterns.
+This example shows how to extract patterns from a raw log field using user-defined patterns.
docs/user/ppl/cmd/patterns.md-25-26 (1)

25-26: Use hyphens for compound adjectives.

The phrase "low frequency words" should use a hyphen to form a compound adjective modifying the noun.

Apply this diff:

-This sets the lower bound of frequency to ignore low frequency words. **Default:** 0.3.
+This sets the lower bound of frequency to ignore low-frequency words. **Default:** 0.3.
docs/user/ppl/admin/connectors/security_lake_connector.md-17-17 (1)

17-17: Fix grammar: "in future" → "in the future".

Line 17 uses non-standard phrasing. Apply this change:

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. We will add more support in the future.
docs/user/ppl/cmd/describe.md-8-8 (1)

8-8: Fix markdown reference link issue in syntax specification.

Line 8 triggers a markdown linting error because [schema.] is parsed as a reference link start, but no reference definition exists. The intent appears to be inline syntax documentation.

Apply this diff to use inline code formatting for the syntax specification:

-describe [dataSource.][schema.]\<tablename\>
+describe `[dataSource.][schema.]<tablename>`

Alternatively, if you prefer to keep the syntax unformatted:

-describe [dataSource.][schema.]\<tablename\>
+describe \[dataSource.\]\[schema.\]<tablename>

The first approach (inline code) is preferred as it clearly distinguishes the syntax specification from regular text.

docs/user/ppl/admin/connectors/s3glue_connector.md-16-16 (1)

16-16: Correct British English to American English.

"in future" should be "in the future" for consistency with standard American English documentation style.

Apply this diff:

-We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in future.
+We currently only support emr-serverless as spark execution engine and Glue as metadata store. we will add more support in the future.
docs/user/ppl/admin/connectors/s3glue_connector.md-6-6 (1)

6-6: Fix article agreement in sentence.

Line 6 reads awkwardly: "how to query and s3Glue datasource" should use "an" instead of "and".

Apply this diff:

-This page covers s3Glue datasource configuration and also how to query and s3Glue datasource.
+This page covers s3Glue datasource configuration and also how to query an s3Glue datasource.
docs/user/ppl/admin/connectors/s3glue_connector.md-77-77 (1)

77-77: Wrap bare URL in Markdown link syntax.

Line 77 contains a bare URL which violates Markdown best practices. Wrap it in proper link syntax or reference format.

Apply this diff:

-These queries would work only top of async queries. Documentation: [Async Query APIs](../../../interfaces/asyncqueryinterface.rst)
+These queries would work only top of async queries. Documentation: [Async Query APIs](../../../interfaces/asyncqueryinterface.rst) and [OpenSearch Spark Docs](https://github.com/opensearch-project/opensearch-spark/blob/main/docs/index.md).

Committable suggestion skipped: line range outside the PR's diff.

docs/user/ppl/admin/connectors/s3glue_connector.md-19-30 (1)

19-30: Fix unordered list indentation to comply with Markdown standards.

Nested list items have inconsistent indentation. All sublists should use 2-space indentation relative to their parent, not 4 or 8 spaces.

Apply this diff to fix the list indentation:

- * `glue.auth.type` [Required]  
-     * This parameters provides the authentication type information required for execution engine to connect to glue.  
-     * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.  
-         * `glue.auth.role_arn`  
- * `glue.indexstore.opensearch.*` [Required]  
-     * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also  
-     * `glue.indexstore.opensearch.uri` [Required]  
-     * `glue.indexstore.opensearch.auth` [Required]  
-         * Accepted values include ["noauth", "basicauth", "awssigv4"]  
-         * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`  
-         * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region`  and `glue.auth.role_arn`  
-     * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]
+ * `glue.auth.type` [Required]
+   * This parameters provides the authentication type information required for execution engine to connect to glue.
+   * S3 Glue connector currently only supports `iam_role` authentication and the below parameters is required.
+     * `glue.auth.role_arn`
+ * `glue.indexstore.opensearch.*` [Required]
+   * This parameters provides the Opensearch domain host information for glue connector. This opensearch instance is used for writing index data back and also
+   * `glue.indexstore.opensearch.uri` [Required]
+   * `glue.indexstore.opensearch.auth` [Required]
+     * Accepted values include ["noauth", "basicauth", "awssigv4"]
+     * Basic Auth required `glue.indexstore.opensearch.auth.username` and `glue.indexstore.opensearch.auth.password`
+     * AWSSigV4 Auth requires `glue.indexstore.opensearch.auth.region` and `glue.auth.role_arn`
+   * `glue.indexstore.opensearch.region` [Required for awssigv4 auth]
docs/user/ppl/cmd/regex.md-98-98 (1)

98-98: Fix escaping in Example 4 regex pattern.

The regex pattern in the Example 4 code block uses double backslashes (\\d{3,4}\\s+), which differs from Example 3 (line 78: @pyrami\.com$). Within Markdown code blocks, backslashes should appear as single characters to represent the actual regex pattern. This inconsistency may confuse users about the correct syntax.

Apply this diff to align the escaping with other examples:

-source=accounts | regex address="\\d{3,4}\\s+[A-Z][a-z]+\\s+(Street|Lane|Court)" | fields account_number, address
+source=accounts | regex address="\d{3,4}\s+[A-Z][a-z]+\s+(Street|Lane|Court)" | fields account_number, address
docs/user/ppl/cmd/head.md-8-8 (1)

8-8: Fix Markdown syntax escaping in the syntax definition.

Line 8 uses backslash escapes (\<size\> and \<offset\>) that appear to be remnants from reStructuredText but will render as literal backslashes in standard Markdown. Use backticks or angle-bracket HTML entities instead.

-head [\<size\>] [from \<offset\>]
+head [`size`] [from `offset`]

Alternatively, if you prefer angle brackets:

-head [\<size\>] [from \<offset\>]
+head [&lt;size&gt;] [from &lt;offset&gt;]
docs/user/ppl/cmd/search.md-666-666 (1)

666-666: Use markdown headings instead of bold emphasis.

Lines 666 and 705 use bold emphasis (**text**) to style section headers, but they should use markdown heading syntax (##) for proper document structure and semantic meaning.

-**Backslash in file paths**
+## Backslash in file paths
-**Text with special characters**
+## Text with special characters

Also applies to: 705-705

docs/user/ppl/cmd/search.md-24-24 (1)

24-24: Fix grammar: use hyphens for compound adjectives.

Lines 24 and 89 have compound adjective formatting issues:

  • Line 24: "other PPL commands" → "other-PPL commands" (or restructure to "unlike other PPL commands")
  • Line 89: "multi field" → "multi-field"
-**Full Text Search**: Unlike other PPL commands, search supports both quoted and unquoted strings.
+**Full Text Search**: Unlike other-PPL commands, search supports both quoted and unquoted strings.
-* Limitations: No wildcards for partial IP matching. For wildcard search use multi field with keyword:
+* Limitations: No wildcards for partial IP matching. For wildcard search use multi-field with keyword:

Also applies to: 89-89

docs/user/ppl/cmd/search.md-92-93 (1)

92-93: Fix list indentation formatting.

Lines 92-93 have incorrect indentation for unordered list items. Unordered list items should start at column 0, not be indented by 3 spaces.

 **Field Type Performance Tips**:
-   * Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
-   * For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
+* Each field type has specific search capabilities and limitations. Using the wrong field type during ingestion impacts performance and accuracy  
+* For wildcard searches on non-keyword fields: Add a keyword field copy for better performance. Example: If you need wildcards on a text field, create `message.keyword` alongside `message`  
docs/user/ppl/cmd/subquery.md-92-106 (1)

92-106: Add language specification to fenced code block.

Code blocks should declare a language for proper syntax highlighting and linting compliance.

-```
+```ppl
 // Assumptions: `a`, `b` are fields of table outer, `c`, `d` are fields of table inner,  `e`, `f` are fields of table nested
 source = outer | where exists [ source = inner | where a = c ]
 source = outer | where not exists [ source = inner | where a = c ]
 source = outer | where exists [ source = inner | where a = c and b = d ]
 source = outer | where not exists [ source = inner | where a = c and b = d ]
 source = outer exists [ source = inner | where a = c ] // search filtering with subquery
 source = outer not exists [ source = inner | where a = c ] //search filtering with subquery
 source = table as t1 exists [ source = table as t2 | where t1.a = t2.a ] //table alias is useful in exists subquery
 source = outer | where exists [ source = inner1 | where a = c and exists [ source = nested | where c = e ] ] //nested
 source = outer | where exists [ source = inner1 | where a = c | where exists [ source = nested | where c = e ] ] //nested
 source = outer | where exists [ source = inner | where c > 10 ] //uncorrelated exists
 source = outer | where not exists [ source = inner | where c > 10 ] //uncorrelated exists
 source = outer | where exists [ source = inner ] | eval l = "nonEmpty" | fields l //special uncorrelated exists
-```
+```
docs/user/ppl/cmd/subquery.md-110-135 (1)

110-135: Add language specification to fenced code block.

Code blocks should declare a language for proper syntax highlighting and linting compliance.

-```
+```ppl
 //Uncorrelated scalar subquery in Select
 source = outer | eval m = [ source = inner | stats max(c) ] | fields m, a
 source = outer | eval m = [ source = inner | stats max(c) ] + b | fields m, a
 //Uncorrelated scalar subquery in Where**
 source = outer | where a > [ source = inner | stats min(c) ] | fields a
 //Uncorrelated scalar subquery in Search filter
 source = outer a > [ source = inner | stats min(c) ] | fields a
 //Correlated scalar subquery in Select
 source = outer | eval m = [ source = inner | where outer.b = inner.d | stats max(c) ] | fields m, a
 source = outer | eval m = [ source = inner | where b = d | stats max(c) ] | fields m, a
 source = outer | eval m = [ source = inner | where outer.b > inner.d | stats max(c) ] | fields m, a
 //Correlated scalar subquery in Where
 source = outer | where a = [ source = inner | where outer.b = inner.d | stats max(c) ]
 source = outer | where a = [ source = inner | where b = d | stats max(c) ]
 source = outer | where [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
 //Correlated scalar subquery in Search filter
 source = outer a = [ source = inner | where b = d | stats max(c) ]
 source = outer [ source = inner | where outer.b = inner.d OR inner.d = 1 | stats count() ] > 0 | fields a
 //Nested scalar subquery
 source = outer | where a = [ source = inner | stats max(c) | sort c ] OR b = [ source = inner | where c = 1 | stats min(d) | sort d ]
 source = outer | where a = [ source = inner | where c =  [ source = nested | stats max(e) by f | sort f ] | stats max(d) by c | sort c | head 1 ]
-RelationSubquery
+
+**RelationSubquery**
+
-```
+```ppl
 source = table1 | join left = l right = r on condition [ source = table2 | where d > 10 | head 5 ] //subquery in join right side
 source = [ source = table1 | join left = l right = r [ source = table2 | where d > 10 | head 5 ] | stats count(a) by b ] as outer | head 1
-```
+```
docs/user/ppl/cmd/subquery.md-77-88 (1)

77-88: Add language specification to fenced code block.

Code blocks should declare a language for proper syntax highlighting and linting compliance.

-```
+```ppl
 source = outer | where a in [ source = inner | fields b ]
 source = outer | where (a) in [ source = inner | fields b ]
 source = outer | where (a,b,c) in [ source = inner | fields d,e,f ]
 source = outer | where a not in [ source = inner | fields b ]
 source = outer | where (a) not in [ source = inner | fields b ]
 source = outer | where (a,b,c) not in [ source = inner | fields d,e,f ]
 source = outer a in [ source = inner | fields b ] // search filtering with subquery
 source = outer a not in [ source = inner | fields b ] // search filtering with subquery)
 source = outer | where a in [ source = inner1 | where b not in [ source = inner2 | fields c ] | fields b ] // nested
 source = table1 | inner join left = l right = r on l.a = r.a AND r.a in [ source = inner | fields d ] | fields l.a, r.a, b, c //as join filter
-```
+```
docs/user/ppl/cmd/subquery.md-196-196 (1)

196-196: Replace hard tab with spaces.

Line 196 contains hard tab characters. Use spaces for consistent indentation across the file.

- }
+ }

Committable suggestion skipped: line range outside the PR's diff.

docs/user/ppl/cmd/sort.md-14-17 (1)

14-17: Fix malformed block quote formatting.

Lines 16–17 contain bare > markers with no content, which creates malformed block quotes. Remove them.

Apply this diff:

 > **Note:**
 > You cannot mix +/- and asc/desc in the same sort command. Choose one approach for all fields in a single sort command.
->
->
docs/user/ppl/cmd/sort.md-93-93 (1)

93-93: Fix grammar: "document" → "documents".

Apply this diff:

-This example shows sorting all the document by the age field in descending order using the desc keyword.
+This example shows sorting all the documents by the age field in descending order using the desc keyword.
docs/user/ppl/cmd/sort.md-8-8 (1)

8-8: Fix unnecessary escape sequences in the syntax line.

The syntax line uses \| and \- which render as literal backslashes. In Markdown, pipes | and hyphens - do not need escaping in this context.

Apply this diff:

-sort [count] <[+\|-] sort-field \| sort-field [asc\|a\|desc\|d]>...
+sort [count] <[+|-] sort-field | sort-field [asc|a|desc|d]>...
docs/user/ppl/cmd/parse.md-88-129 (1)

88-129: Add language identifiers to fenced code blocks in Limitations section.

The code examples at lines 95, 103, 111, 119, 127 are missing language specifications. Per the Markdown guidelines in docs/dev/testing-doctest.md, all fenced code blocks should specify a language. These should be marked as ppl (for PPL queries) or bash as appropriate.

Apply this diff to add language specifications:

- ```
+ ```ppl
  source=accounts | parse address '\d+ (?<street>.+)' | parse street '\w+ (?<road>\w+)' ;
- ```
+ ```

Repeat for the other limitation examples (lines 103, 111, 119, 127).

docs/dev/testing-doctest.md-61-111 (1)

61-111: Markdown documentation guidelines are comprehensive and well-structured. The new section provides clear, actionable guidance for writing PPL documentation in Markdown format with examples of the paired input/output pattern and testing configuration. However, apply a minor capitalization fix.

Line 61 should capitalize "Markdown" as a proper noun:

-#### RST Format (SQL docs only. On Deprecation path. Use markdown for PPL)
+#### RST Format (SQL docs only. On Deprecation path. Use Markdown for PPL)
docs/user/ppl/cmd/eventstats.md-28-29 (1)

28-29: Fix list indentation for bucket_nullable defaults.

Lines 28-29 have inconsistent indentation (1 space) relative to the parent list item. They should have 0 spaces to align with sibling list items.

Apply this diff to fix the indentation:

 * bucket_nullable: optional. Controls whether the eventstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.  
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+ * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`
+ * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`
docs/user/ppl/cmd/streamstats.md-77-90 (1)

77-90: Specify language for the code block showing command syntax.

The fenced code block at line 77 lacks a language identifier. Use ```ppl to declare it as PPL syntax (or another appropriate language if these are examples in a different syntax).

Apply this diff:

-```
+```ppl
 source = table | streamstats avg(a)
 source = table | streamstats current = false avg(a)
 source = table | streamstats window = 5 sum(b)
 source = table | streamstats current = false window = 2 max(a)
 source = table | streamstats count(c)
 source = table | streamstats min(c), max(c) by b
 source = table | streamstats count(c) as count_by by b | where count_by > 1000
 source = table | streamstats dc(field) as distinct_count
 source = table | streamstats distinct_count(category) by region
 source = table | streamstats current=false window=2 global=false avg(a) by b
 source = table | streamstats window=2 reset_before=a>31 avg(b)
 source = table | streamstats current=false reset_after=a>31 avg(b) by c

</blockquote></details>
<details>
<summary>docs/user/ppl/cmd/streamstats.md-149-162 (1)</summary><blockquote>

`149-162`: **Convert indented code block to fenced code block in Example 3.**

The "original data" table at line 149 uses indented code block syntax. Use fenced code blocks (triple backticks) for consistency with the rest of the document and to satisfy markdown linting standards.

Apply this diff to convert the indented block to fenced:

```diff
 * global=true: a global window is applied across all rows, but the calculations inside the window still respect the by groups.  
 * global=false: the window itself is created per group, meaning each group gets its own independent window.  
   
 This example shows how to calculate the running average of age across accounts by country, using global argument.
-original data
-    +-------+---------+------------+-------+------+-----+
-    | name  | country | state      | month | year | age |
-  
-    |-------+---------+------------+-------+------+-----+
-    | Jake  | USA     | California | 4     | 2023 | 70  |
-    | Hello | USA     | New York   | 4     | 2023 | 30  |
-    | John  | Canada  | Ontario    | 4     | 2023 | 25  |
-    | Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
-    | Jim   | Canada  | B.C        | 4     | 2023 | 27  |
-    | Peter | Canada  | B.C        | 4     | 2023 | 57  |
-    | Rick  | Canada  | B.C        | 4     | 2023 | 70  |
-    | David | USA     | Washington | 4     | 2023 | 40  |
-  
-    +-------+---------+------------+-------+------+-----+
+original data
+```
+| name  | country | state      | month | year | age |
+|-------|---------|------------|-------|------|-----|
+| Jake  | USA     | California | 4     | 2023 | 70  |
+| Hello | USA     | New York   | 4     | 2023 | 30  |
+| John  | Canada  | Ontario    | 4     | 2023 | 25  |
+| Jane  | Canada  | Quebec     | 4     | 2023 | 20  |
+| Jim   | Canada  | B.C        | 4     | 2023 | 27  |
+| Peter | Canada  | B.C        | 4     | 2023 | 57  |
+| Rick  | Canada  | B.C        | 4     | 2023 | 70  |
+| David | USA     | Washington | 4     | 2023 | 40  |
+```
docs/user/ppl/cmd/streamstats.md-36-37 (1)

36-37: Fix nested list indentation in the bucket_nullable description.

Lines 36–37 use inconsistent indentation for nested bullets. They should align with the parent list.

Apply this diff to fix the indentation:

 * bucket_nullable: optional. Controls whether the streamstats command consider null buckets as a valid group in group-by aggregations. When set to `false`, it will not treat null group-by values as a distinct group during aggregation. **Default:** Determined by `plugins.ppl.syntax.legacy.preferred`.
- * When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
- * When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
+ - When `plugins.ppl.syntax.legacy.preferred=true`, `bucket_nullable` defaults to `true`  
+ - When `plugins.ppl.syntax.legacy.preferred=false`, `bucket_nullable` defaults to `false`  
.github/workflows/sql-cli-integration-test.yml-12-12 (1)

12-12: Refine path filter patterns to prevent over-triggering.

The path filter patterns are too broad and will trigger unnecessary workflow runs:

  • Lines 12, 27: **gradle* is ambiguous and will match any file containing "gradle" anywhere in the path, including potentially unrelated nested build artifacts.
  • Lines 15-16, 30-31: **/*.jar and **/*.pom will trigger on any JAR or POM file in the entire repository, including transitive dependencies and build artifacts unrelated to the SQL command changes.

These overly broad patterns may cause the workflow to run excessively on unrelated changes, wasting CI resources.

Consider scoping the patterns more precisely. For example:

      - '**/*.java'
      - '**/*.g4'
      - '!sql-jdbc/**'
-     - '**gradle*'
+     - 'gradle/**'
+     - 'gradle.properties'
+     - 'build.gradle'
      - '**lombok*'
      - 'integ-test/**'
-     - '**/*.jar'
-     - '**/*.pom'
+     - 'gradle/wrapper/**'

Alternatively, if broad dependency monitoring is intended, document that expectation in the workflow comments.

Also applies to: 15-16, 27-27, 30-31

🧹 Nitpick comments (29)
docs/user/ppl/admin/monitoring.md (1)

5-5: Fix grammar and style on line 5.

Three issues detected:

  1. "able to collect" → "can collect" (cleaner phrasing)
  2. "node level" → "node-level" (compound modifier requires hyphenation)
  3. "Cluster level" → "Cluster-level" (same reason)

Apply this diff:

-By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster level statistics have yet to be implemented.
+By a stats endpoint, you can collect metrics for the plugin within the interval. Note that only node-level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster-level statistics have yet to be implemented.
docs/user/ppl/cmd/replace.md (2)

9-11: Reduce repetitive phrasing in parameter descriptions.

The word "mandatory" is repeated three times across consecutive parameter definitions. Consider varying the phrasing for better readability.

- * pattern: mandatory. The text pattern you want to replace.  
- * replacement: mandatory. The text you want to replace with.  
- * field-name: mandatory. One or more field names where the replacement should occur.  
+ * pattern: The text pattern you want to replace (required).  
+ * replacement: The replacement text (required).  
+ * field-name: One or more field names where the replacement should occur (required).  

111-111: Use hyphen for compound adjective modifying noun.

"Pattern matching" should be hyphenated when it functions as a compound adjective preceding a noun.

- Since replace command only supports plain string literals, you can use LIKE command with replace for pattern matching needs.
+ Since replace command only supports plain string literals, you can use LIKE command with replace for pattern-matching needs.
docs/user/ppl/cmd/rename.md (1)

9-10: Simplify the parameter descriptions to reduce repetition.

Line ~9 introduces "field you want to rename" and line ~10 repeats "name you want to rename to," creating redundant phrasing. Simplify for clarity.

-* source-field: mandatory. The name of the field you want to rename. Supports wildcard patterns using `*`.  
-* target-field: mandatory. The name you want to rename to. Must have same number of wildcards as the source.
+* source-field: mandatory. The field to rename. Supports wildcard patterns using `*`.  
+* target-field: mandatory. The new field name. Must have the same number of wildcards as the source.

This version removes the repetitive "you want to" phrasing and improves conciseness.

docs/user/ppl/admin/connectors/prometheus_connector.md (5)

10-18: Fix unordered list indentation to match Markdown style guide.

Bullets and nested items use 4-space/8-space indentation, but the project's Markdown style expects 2-space/4-space. This will be flagged by markdownlint.

Apply this diff to fix indentation:

-* `prometheus.uri` [Required].  
-    * This parameters provides the URI information to connect to a prometheus instance.  
-* `prometheus.auth.type` [Optional]  
-    * This parameters provides the authentication type information.  
-    * Prometheus connector currently supports `basicauth` and `awssigv4` authentication mechanisms.  
-    * If prometheus.auth.type is basicauth, following are required parameters.  
-        * `prometheus.auth.username` and `prometheus.auth.password`.  
-    * If prometheus.auth.type is awssigv4, following are required parameters.  
-        * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`  
+* `prometheus.uri` [Required].  
+  * This parameters provides the URI information to connect to a prometheus instance.  
+* `prometheus.auth.type` [Optional]  
+  * This parameters provides the authentication type information.  
+  * Prometheus connector currently supports `basicauth` and `awssigv4` authentication mechanisms.  
+  * If prometheus.auth.type is basicauth, following are required parameters.  
+    * `prometheus.auth.username` and `prometheus.auth.password`.  
+  * If prometheus.auth.type is awssigv4, following are required parameters.  
+    * `prometheus.auth.region`, `prometheus.auth.access_key` and `prometheus.auth.secret_key`  

229-230: Standardize indentation in code examples to 2-space bullets.

Lines 229–230 use 4-space indentation for list items; adjust to 2-space for consistency.

-    - `source=my_prometheus.query_range('prometheus_http_requests_total', 1686694425, 1686700130, 14)`  
-    - `source=my_prometheus.query_range(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130, step=14)`  
+  - `source=my_prometheus.query_range('prometheus_http_requests_total', 1686694425, 1686700130, 14)`  
+  - `source=my_prometheus.query_range(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130, step=14)`  

260-261: Standardize indentation in query_exemplars code examples to 2-space bullets.

Lines 260–261 use 4-space indentation; adjust to 2-space for consistency.

-    - `source=my_prometheus.query_exemplars('prometheus_http_requests_total', 1686694425, 1686700130)`  
-    - `source=my_prometheus.query_exemplars(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130)`  
+  - `source=my_prometheus.query_exemplars('prometheus_http_requests_total', 1686694425, 1686700130)`  
+  - `source=my_prometheus.query_exemplars(query='prometheus_http_requests_total', starttime=1686694425, endtime=1686700130)`  

101-101: Clarify compound word "endtime" for better readability.

Line 101 uses "endtime" as a single word. For clarity in prose (as opposed to code/parameter names), consider "end time" (two words) or verify this matches the API documentation convention.


102-102: Hyphenate "auto-determined" in prose.

Line 102 reads "auto determined"; use the hyphenated form "auto-determined" for adjective modifiers.

-* In case of stats, resolution is auto determined from the time range set.  
+* In case of stats, resolution is auto-determined from the time range set.  
docs/user/ppl/admin/settings.md (2)

5-5: Simplify phrasing for clarity.

"Able to change" can be replaced with "can" for more direct, concise phrasing.

-When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings are able to change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.
+When OpenSearch bootstraps, PPL plugin will register a few settings in OpenSearch cluster settings. Most of the settings can change dynamically so you can control the behavior of PPL plugin without need to bounce your cluster.

209-209: Add hyphen to compound adjective.

"Performance sensitive" should be hyphenated when used as a compound modifier before a noun.

-Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
+Since 3.3.0, join types `inner`, `left`, `outer` (alias of `left`), `semi` and `anti` are supported by default. `right`, `full`, `cross` are performance-sensitive join types which are disabled by default. Set config `plugins.calcite.all_join_types.allowed = true` to enable.
docs/user/ppl/admin/security.md (3)

6-6: Fix capitalization: use "REST API" instead of "Rest API".

For consistency with standard terminology and other documentation sections (e.g., line 46), capitalize "REST API" properly.

-## Using Rest API
+## Using REST API

10-10: Improve grammar and punctuation.

The sentence structure is unclear. Use a comma instead of a period before "then" and lowercase it, or restructure the sentence for clarity.

-Example: Create the ppl_role for test_user. then test_user could use PPL to query `ppl-security-demo` index.
+Example: Create the ppl_role for test_user, then test_user can use PPL to query the `ppl-security-demo` index.

65-65: Consider the stability and accessibility of the external image link.

The image is hosted on a GitHub user content URL, which could become stale or inaccessible if the user account or repository changes. Consider embedding the image in the repository or using a more stable reference.

docs/user/ppl/cmd/timechart.md (1)

8-8: Consider removing unnecessary backslash escaping in the syntax line.

In Markdown, angle brackets don't need escaping and can be written directly. The backslashes (\<, \>) are unnecessary and are likely an artifact from reStructuredText migration.

-timechart [timefield=\<field_name\>] [span=\<time_interval\>] [limit=\<number\>] [useother=\<boolean\>] \<aggregation_function\> [by \<field\>]
+timechart [timefield=<field_name>] [span=<time_interval>] [limit=<number>] [useother=<boolean>] <aggregation_function> [by <field>]
docs/user/ppl/cmd/multisearch.md (1)

20-25: Fix Markdown syntax formatting in the syntax specification.

Line 22 uses backslash-escaped angle brackets (\<subsearch1\>) which are unnecessary in Markdown and may render as visible backslashes. The syntax specification should use standard placeholder notation without escaping, or alternatively match the square-bracket format shown in actual usage examples.

Consider one of these approaches:

Option 1: Use plain angle brackets without escaping:

-multisearch \<subsearch1\> \<subsearch2\> \<subsearch3\> ...
+multisearch <subsearch1> <subsearch2> <subsearch3> ...

Option 2: Use square bracket notation to match actual usage:

-multisearch \<subsearch1\> \<subsearch2\> \<subsearch3\> ...
+multisearch [subsearch1] [subsearch2] [subsearch3] ...
docs/user/ppl/cmd/chart.md (3)

8-8: Simplify the syntax block for readability.

The syntax line is quite dense. Breaking it into separate lines or adding a visual hierarchy would improve clarity for users trying to understand the command structure at a glance.

Consider reformatting for better readability:

chart [limit=(top|bottom) <number>] [useother=<boolean>] [usenull=<boolean>] 
      [nullstr=<string>] [otherstr=<string>] <aggregation_function> 
      [by <row_split> [<column_split>] | over <row_split> [by <column_split>]]

31-37: Clarify the distinction between by and over...by... syntax modes.

The two grouping syntaxes (by and over...by...) are described sequentially but the functional relationship between them isn't immediately clear—specifically, that over <field> alone is equivalent to by <field>, while both can be combined as over <row> by <column>. A concise summary statement would improve scannability.

Consider adding a clarifying statement:

* by: Groups results by one or two fields. When two fields are provided, 
  the first is the row split and the second is the column split.
* over...by...: Alternative syntax for the same grouping capability:
  - `over <row_split>` is equivalent to `by <row_split>`
  - `over <row_split> by <column_split>` is equivalent to `by <row_split> <column_split>`

42-42: Emphasize the null-handling behavior for aggregation functions.

This note explains important behavior (exclusion of null values during aggregation), but its placement and phrasing could make it more discoverable. Users designing queries need to understand this early to avoid surprises.

Consider relocating this note to the main "Notes" section header or emphasizing it more prominently as a distinct caveat. You might also add an example showing the impact, such as:

## Notes on Null Handling and Aggregation

* **Aggregation exclusion:** Documents with null values in fields used by the 
  aggregation function are excluded from aggregation. For example, 
  `chart avg(balance)` will not include documents where balance is null.
docs/user/ppl/cmd/spath.md (1)

18-18: Minor phrasing improvement: clarify "simplest spath".

The phrase "The simplest spath is to extract a single field" reads awkwardly. Consider rephrasing to "A simple spath command extracts a single field" or "The simplest spath example extracts a single field."

docs/user/ppl/admin/datasources.md (7)

38-38: Use hyphens for compound adjectives.

Lines 38 and 48 contain compound adjectives that need hyphenation for correctness:

  • "security disabled domains" → "security-disabled domains"

Also, capitalize the sentence on line 48: "we can remove" → "We can remove"

- * In case of security disabled domains, authorization is disbaled.
+ * In case of security-disabled domains, authorization is disabled.
...
- we can remove authorization and other details in case of security disabled domains.
+ We can remove authorization and other details in case of security-disabled domains.

Also applies to: 48-48


38-38: Fix typo: "disbaled" → "disabled".

Line 38 contains a typo: "authorization is disbaled" should be "authorization is disabled"


106-106: Fix formatting in API endpoint headers.

Lines 106 and 115 are missing proper spacing and closing parenthesis before the code block:

  • Line 106: Datasource Read GET API("_plugins/_query/_datasources/{{dataSourceName}}"
  • Line 115: Datasource Deletion DELETE API("_plugins/_query/_datasources/{{dataSourceName}}"

Both should close the parenthetical and include a line break before the code block.

- * Datasource Read GET API("_plugins/_query/_datasources/{{dataSourceName}}"
+ * Datasource Read GET API (`_plugins/_query/_datasources/{{dataSourceName}}`)
  
- * Datasource Deletion DELETE API("_plugins/_query/_datasources/{{dataSourceName}}"
+ * Datasource Deletion DELETE API (`_plugins/_query/_datasources/{{dataSourceName}}`)

Also applies to: 115-115


148-148: Hyphenate numeric compound adjectives.

Line 148: "24 character master key" should use a hyphen: "24-character master key"

- * Sample python script to generate a 24 character master key
+ * Sample python script to generate a 24-character master key

202-202: Consider more concise phrasing.

Line 202 uses "prior to" which is somewhat wordy. Consider: "In versions before 2.7" or "Earlier than version 2.7"

- * In versions prior to 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.
+ * In versions before 2.7, the plugins.query.federation.datasources.config key store setting was used to configure datasources, but it has been deprecated and will be removed in version 3.0.

224-224: Use British/American English consistently: "in future" → "in the future".

Line 226: The phrase "in future" is British English. For consistency with the rest of the documentation, use "in the future"

- In the current state, `information_schema` only support metadata of tables.
- This schema will be extended for views, columns and other metadata info in future.
+ In the current state, `information_schema` only supports metadata of tables.
+ This schema will be extended for views, columns and other metadata info in the future.

Also note: Line 225 should use "supports" (plural agreement with "information_schema") rather than "support"


224-228: Clarify awkward phrasing for better readability.

Line 224: "query tables information under a datasource" is awkward. Suggested revision: "query information about tables in a datasource" or "retrieve table metadata from a datasource"

- Use `information_schema` in source command to query tables information under a datasource.
+ Use `information_schema` in source command to query table information within a datasource.
README.md (1)

91-91: Remove redundant "Language" from "SQL Language Reference Manual".

"SQL" already stands for "Structured Query Language", making "SQL Language" redundant. Simplify to "SQL Reference Manual" or "Language Reference Manual".

-Please refer to the [SQL Language Reference Manual](./docs/user/index.rst), [Piped Processing Language (PPL) Reference Manual](./docs/user/ppl/index.md), [OpenSearch SQL/PPL Engine Development Manual](./docs/dev/index.md) and [Technical Documentation](https://opensearch.org/docs/latest/search-plugins/sql/index/) for detailed information on installing and configuring plugin.
+Please refer to the [SQL Reference Manual](./docs/user/index.rst), [Piped Processing Language (PPL) Reference Manual](./docs/user/ppl/index.md), [OpenSearch SQL/PPL Engine Development Manual](./docs/dev/index.md) and [Technical Documentation](https://opensearch.org/docs/latest/search-plugins/sql/index/) for detailed information on installing and configuring plugin.
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)

198-244: Custom SPLIT implementation with empty-delimiter handling looks sound; consider clarifying character semantics

The SPLIT wiring and implementation look correct overall:

  • Static import and registration under BuiltinFunctionName.SPLIT match the new enum constant.
  • The two-arg FunctionImp2 with PPLTypeChecker.family(SqlTypeFamily.CHARACTER, SqlTypeFamily.CHARACTER) aligns with str, delimiter usage.
  • CASE over delimiter = '' cleanly preserves existing behavior for non-empty delimiters while adding the “split into elements” special case.

One minor semantic point to double-check:

  • For the empty-delimiter branch you use REGEXP_EXTRACT_ALL(str, "."). In Calcite-style regex, "." does not match newline characters and works on regex “characters” (code units), not necessarily full Unicode code points.
  • If the intended behavior is “truly per-character including newlines / full Unicode,” you may want to either:
    • Document this as “regex-based per-character” semantics, or
    • Adjust the pattern (e.g., (?s). or an equivalent) if matching newlines is required.

Also, please ensure tests cover:

  • split(field, '') with strings containing newlines and multi-byte characters.
  • Type compatibility of the CASE branches (both returning the same array element type) to avoid dialect-specific surprises.

Overall, the design is good; this is just a small semantics/clarity check.

Also applies to: 993-1020

Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
Signed-off-by: Asif Bashar <[email protected]>
@asifabashar
Copy link
Contributor Author

@LantaoJin after resolving conflict, the review mark has been reset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] addtotals command to show total across rows , addcoltotals command to show totals across columns

2 participants