feat: Add Pandera data validation plugin by andreahlert · Pull Request #631 · flyteorg/flyte-sdk

andreahlert · 2026-02-08T11:26:33Z

Summary

Port the Pandera plugin from flytekit v1 to the Flyte v2 SDK, enabling automatic runtime validation of pandas DataFrames against Pandera schemas as data flows between tasks.

This brings Pandera support to the v2 SDK, following the same plugin architecture used by polars, wandb, and other existing plugins.

What's included

PanderaTransformer: A TypeTransformer that wraps the DataFrameTransformerEngine to add Pandera schema validation on both to_literal (output) and to_python_value (input)
ValidationConfig: Configurable error handling (raise or warn) via typing.Annotated
PandasReportRenderer: HTML report generation using great_tables for validation results (compatible with Flyte Decks)
Validation memo: Skips redundant re-validation during local execution where to_literal is immediately followed by to_python_value
Entry point registration: Registered via flyte.plugins.types entry point, loaded automatically by flyte.init()

Usage

import flyte
import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series

env = flyte.TaskEnvironment(name="my-env")

class UserSchema(pa.DataFrameModel):
    name: Series[str]
    age: Series[int] = pa.Field(ge=0, le=120)
    email: Series[str]

@env.task
async def generate_users() -> DataFrame[UserSchema]:
    return pd.DataFrame({
        "name": ["Alice", "Bob"],
        "age": [25, 30],
        "email": ["alice@example.com", "bob@example.com"],
    })

@env.task
async def process_users(df: DataFrame[UserSchema]) -> DataFrame[UserSchema]:
    df["age"] = df["age"] + 1
    return df

Test plan

Transformer registration with TypeEngine
Literal type generation for pandera DataFrame types
Schema extraction from DataFrameModel annotations
Validation with valid data (to_literal)
Validation failure with invalid data (raises SchemaErrors)
Warn mode with ValidationConfig(on_error="warn")
Roundtrip encode/decode with valid data
Validation memo prevents duplicate validation
Type assertion checks

kumare3 · 2026-02-08T20:34:00Z

plugins/pandera/src/flyteplugins/pandera/renderer.py

+            .as_raw_html()
+        )
+
+    def to_html(


There is no implicit to_html call in 2.0. you will have to invoke it cc @wild-endeavor ?
also cc @cosmicBboy

What is the right ux for this.

Yeah, this was one of the parts I wasn't sure about. I ported the renderer from v1 but I knew the Decks integration doesn't exist the same way in v2, so I left it open for discussion.

Looking at the codebase, I see v2 has the Renderable protocol and TypeEngine.to_html() checks for it. I think the cleanest approach would be to override to_html() directly in PanderaTransformer - since the transformer already has access to the schema via _get_pandera_schema():

def to_html(self, python_val, expected_python_type): schema, config = self._get_pandera_schema(expected_python_type) renderer = PandasReportRenderer(title=f"Pandera Report: {schema.name}") return renderer.to_html(python_val, schema)

This way it works automatically through the Report system without users needing to do anything extra.

Another option would be making PandasReportRenderer implement the Renderable protocol so users could opt-in via Annotated[DataFrame[Schema], PandasReportRenderer()], but that feels like unnecessary friction for the common case.

Open to suggestions on the right direction here.

kumare3 · 2026-02-08T20:34:19Z

What is the usage example for this, i dont think this works especially for the html report

Port the Pandera plugin from flytekit v1 to the Flyte v2 SDK, enabling automatic runtime validation of pandas DataFrames against Pandera schemas as data flows between tasks. The plugin registers pandera.typing.DataFrame as a custom type with the TypeEngine, wrapping the DataFrameTransformerEngine to add schema validation on both serialization and deserialization. Features: - Automatic validation via pandera.typing.DataFrame type annotations - Configurable error handling (raise or warn) via ValidationConfig - HTML validation reports using great_tables for Flyte Decks - Validation memo to skip redundant re-validation in local execution Signed-off-by: André Ahlert <andre@aex.partners>

kumare3 reviewed Feb 8, 2026

View reviewed changes

andreahlert force-pushed the feat/pandera-plugin branch from 5c32e96 to 70add7b Compare February 9, 2026 04:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Pandera data validation plugin#631

feat: Add Pandera data validation plugin#631
andreahlert wants to merge 1 commit intoflyteorg:mainfrom
andreahlert:feat/pandera-plugin

andreahlert commented Feb 8, 2026

Uh oh!

kumare3 Feb 8, 2026

Uh oh!

andreahlert Feb 8, 2026

Uh oh!

kumare3 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andreahlert commented Feb 8, 2026

Summary

What's included

Usage

Test plan

Uh oh!

kumare3 Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

andreahlert Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

kumare3 commented Feb 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants