feat: add flatten_dataframe function for nested DataFrames#187
Open
jlaportebot wants to merge 1 commit into
Open
feat: add flatten_dataframe function for nested DataFrames#187jlaportebot wants to merge 1 commit into
jlaportebot wants to merge 1 commit into
Conversation
Add flatten_dataframe function to recursively flatten nested structures in DataFrames, including StructType, ArrayType, and MapType columns. - Added flatten_dataframe function in dataframe_transformer.py - Function supports custom separator for flattened column names - Handles StructType by expanding sub-elements to columns - Handles ArrayType by exploding arrays to rows - Handles MapType by extracting all keys as columns - Added comprehensive test suite with 12 test cases - Added function to public API in __init__.py Closes MrPowers#47
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a
flatten_dataframefunction to recursively flatten nested structures in PySpark DataFrames, addressing issue #47. The function handles StructType, ArrayType, and MapType columns and converts them into flat columns with customizable separators.Changes
New Functionality
Added
flatten_dataframefunction inchispa/dataframe_transformer.py:explode_outerto add array elements as rowsAdded comprehensive test suite in
tests/test_dataframe_transformer.py:Updated public API in
chispa/__init__.py:flatten_dataframeto importsflatten_dataframeto__all__list for public API exposureImplementation Details
Key Features
Code Quality
|syntax)Testing
tests.sparkExamples
Flatten Struct Fields
Flatten Map Fields
Related Issues
Closes #47
Checklist
Notes
This implementation is based on the example code provided in issue #47, adapted to follow the project's coding standards and best practices. The function is designed to be backward compatible and does not modify any existing functionality.