Skip to content

materialize-iceberg: add materialization name tag to EMR Serverless job runs #4042

@SeanWhelan

Description

@SeanWhelan

Description

The Iceberg materialization connector submits Spark jobs to EMR Serverless but doesn't tag them with anything. If a customer runs multiple materializations through a single EMR application, there's no way to tell which jobs belong to which pipeline for cost breakdowns, filtering in the console, auditing etc.

Request

Add an estuary:materialization tag with the full task name (e.g. acmeCo/prod/materialize-iceberg) to every StartJobRun call. The value is already available as materializationName on the emrClient struct - this is a one-line addition to the StartJobRunInput in materialize-iceberg/emr.go.

The job Name field already contains the materialization name and is visible in the EMR console, but it can't be used for cost allocation or programmatic filtering. Only tags work for that.

IAM consideration

Passing tags on StartJobRun likely requires emr-serverless:TagResource permission. Our documented IAM policy doesn't include it today, so adding tags unconditionally would break existing customers whose policies only grant StartJobRun. Therefore, we should probably make the tag opt-in via spec/config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions