Skip to content

Conversation

@klion26
Copy link
Member

@klion26 klion26 commented Dec 4, 2025

Which issue does this PR close?

What changes are included in this PR?

Add support for variant to arrow primitive types(for the remaining arrow primitive types), and some tests to cover them.

For the behavior that can't be cast safely, I'll continue to track them in #8086 and #8873

Self::make_time will return the native value for the given timestamp type
Date64Type::from_naive_date(v) will return the milliseconds elapsed since UNIX epoch

VariantType Arrow Type Logic
Date Date64 datatypes::Date64Type::from_naive_date(v)
Timestamp[_ntz](Micro/Nano) Timestamp[_ntz](Second) - if (timestamp.nano == 0) Self::make_time(timestamp)
- else None
Timestamp[_ntz](Micro/Nano) Timestamp[_ntz](Millisecond) - if (timestamp.nano % 1_000_000 == 0) Self::make_time(timestamp)
- else None
Time Time32(Second) - if (timestamp.nano == 0) v. num_seconds_from_midnight()
- else None
Time Time32(Millisecond) - if (timestamp.nano % 1_000_000 == 0) v.num_seconds_from_midnight() * 1000 + v.nano / 1_000_000
- else Nnoe
Time Time64(Nano) timestamp.num_seconds * 1_000_000_000 + v.nano

Are these changes tested?

Added some new tests

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Dec 4, 2025
@klion26 klion26 force-pushed the remaining_arrow_types_from_variant branch from 69bb43b to 1af7a22 Compare December 4, 2025 02:50
Copy link
Member Author

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scovich @alamb Please help to review this when you're free, thanks.

});
impl_primitive_from_variant!(datatypes::Time64MicrosecondType, as_time_utc, |v| {
(v.num_seconds_from_midnight() * 1_000_000 + v.nanosecond() / 1_000) as i64
Some((v.num_seconds_from_midnight() * 1_000_000 + v.nanosecond() / 1_000) as i64)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to handle the case v.nanosecond() % 1000 != 0 here. Thanks for variant_array::canonicalize_and_verify_data_type, we can assume that the input here is always Time64(TimeUnit::Microsecond)(No Time32 or Time64(TimeUnitNanosecond))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Support more Arrow Datatypes from Variant primitive types

1 participant