Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Nov 13, 2025

Description

The mvzip function combines values from two multivalue fields pairwise with a delimiter.
It stitches together corresponding elements from each field, similar to Python’s zip() function.

The function supports two modes of operation:

  1. Default delimiter:

    mvzip(mv_left, mv_right)
    

    Combines fields using a comma (,) as the default delimiter.

  2. Custom delimiter:

    mvzip(mv_left, mv_right, delimiter)
    

    Combines fields using the specified delimiter.


Key Features

  • Pairwise combination: Combines 1st element of left with 1st of right, 2nd with 2nd, etc.
  • Stops at shorter length: Processing stops at the length of the shorter field (Python zip() behavior).
  • Scalar handling: Treats scalar values as single-element arrays.
  • Null handling: Returns null if either input is null.
  • Default delimiter: Uses comma (,) when delimiter is not specified.

Usage Examples

Basic Usage with Default Delimiter

source=people 
| eval hosts = array('host1', 'host2'), ports = array(80, 443), nserver = mvzip(hosts, ports) 
| fields nserver
# Returns: [host1,80, host2,443]

source=accounts 
| eval result = mvzip(firstname, lastname) 
| fields result
# Returns: [Amber,Duke]

Custom Delimiter

source=people 
| eval arr1 = array('a', 'b', 'c'), arr2 = array('x', 'y', 'z'), result = mvzip(arr1, arr2, '|') 
| fields result
# Returns: [a|x, b|y, c|z]

source=accounts 
| eval result = mvzip(firstname, lastname, ' ') 
| fields result
# Returns: [Amber Duke]

Different Length Arrays

source=people 
| eval arr1 = array(1, 2, 3), arr2 = array('a', 'b'), result = mvzip(arr1, arr2) 
| fields result
# Returns: [1,a, 2,b]
# Note: Stops at length of shorter array

Nested mvzip Calls

source=people 
| eval field1 = array('a', 'b'), field2 = array('c', 'd'), field3 = array('e', 'f'), result = mvzip(mvzip(field1, field2, '|'), field3, '|') 
| fields result
# Returns: [a|c|e, b|d|f]

Null Handling

source=people 
| eval result = mvzip(nullif(1, 1), array('test')) 
| fields result
# Returns: null

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs marked this pull request as ready for review November 13, 2025 22:11
@ahkcs ahkcs changed the title Support mvzipeval function Support mvzip eval function Nov 13, 2025
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: can SQL's ARRAYS_ZIP be used for this?

@ahkcs
Copy link
Contributor Author

ahkcs commented Nov 13, 2025

QQ: can SQL's ARRAYS_ZIP be used for this?

Thanks for the question! I considered ARRAYS_ZIP but it's not suitable for mvzip due to semantic differences:

  1. Return Type:
    - ARRAYS_ZIP → ARRAY (e.g., [STRUCT(0:'a', 1:'x'), STRUCT(0:'b', 1:'y')])
    - mvzip → ARRAY (e.g., ['a|x', 'b|y'])
  2. Delimiter Requirement:
    - mvzip requires custom delimiter support for string concatenation
    - ARRAYS_ZIP creates structured data without string formatting

@ahkcs ahkcs requested a review from dai-chen November 13, 2025 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants