Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 14, 2025

Summary

This PR implements the split eval function for PPL, enabling users to split strings into multivalue arrays based on a delimiter.

Examples

Basic split with semicolon

source=people | eval result = split('a;b;c', ';')

Result: ['a', 'b', 'c']


Split into individual characters (empty delimiter)

source=people | eval result = split('abcd', '')

Result: ['a', 'b', 'c', 'd']


Multi-character delimiter

source=people | eval result = split('name::value', '::')

Result: ['name', 'value']


Split field value

source=people | eval words = split(employer, ' ')

Splits the employer field on spaces.


Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dai-chen dai-chen added enhancement New feature or request PPL Piped processing language labels Nov 18, 2025
@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 18, 2025

https://issues.apache.org/jira/browse/CALCITE-6951

Hi @dai-chen , currently we are using SPLIT, which is also supported by Calcite. And we did some special handling for empty delimiter on top of that.

Here's the documentation for Calcite SPLIT function:


SPLIT(string [, delimiter ]) 

Returns the string array of string split at delimiter (if omitted, default is comma). If the string is empty it returns an empty array, otherwise, if the delimiter is empty, it returns an array containing the original string.

@dai-chen
Copy link
Collaborator

https://issues.apache.org/jira/browse/CALCITE-6951

Hi @dai-chen , currently we are using SPLIT, which is also supported by Calcite. And we did some special handling for empty delimiter on top of that.

Here's the documentation for Calcite SPLIT function:


SPLIT(string [, delimiter ]) 

Returns the string array of string split at delimiter (if omitted, default is comma). If the string is empty it returns an empty array, otherwise, if the delimiter is empty, it returns an array containing the original string.

I see. So the only reason of SplitFunctionImp is special handling for delimiter="", right?
Can the extract function below help?

  • Case 1: Delimiter is not empty string, translate split to
SELECT SPLIT('a;b;c;d', ';');
+--------------+
|    EXPR$0    |
+--------------+
| [a, b, c, d] |
+--------------+
  • Case 2: Delimiter is empty string, translate split to:
SELECT REGEXP_EXTRACT_ALL('abcd', '.');
+--------------+
|    EXPR$0    |
+--------------+
| [a, b, c, d] |
+--------------+

@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 19, 2025

@dai-chen Thanks for the suggestion! I think it makes sense. I have updated the PR to move the implementation to PPLFuncImpTable

dai-chen
dai-chen previously approved these changes Nov 20, 2025
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Signed-off-by: Kai Huang <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
#	ppl/src/main/antlr/OpenSearchPPLLexer.g4
#	ppl/src/main/antlr/OpenSearchPPLParser.g4
#	ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
#	ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.19-dev enhancement New feature or request PPL Piped processing language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants