Skip to content

[Improvement]: Collect table_summary metrics independently of self-optimizing #4099

@j1wonpark

Description

@j1wonpark

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, table_summary metrics (total files, file sizes, health score, etc.) are only collected when self-optimizing.enabled=true. This is because the metric update path (setTableSummary()) is gated behind the optimizingConfig.isEnabled() check in TableRuntimeRefreshExecutor.tryEvaluatingPendingInput().

As a result, tables with self-optimizing.enabled=false always show 0 or N/A for key monitoring metrics (Health Score, Total Files, Total Files Size) in Grafana dashboards, even though the table has actual data files.

Metric collection and self-optimizing execution are conceptually independent concerns. Users should be able to monitor table health without enabling the optimizing process.

How should we improve?

Introduce a new table property self-optimizing.table-summary.enabled that allows metric collection to be enabled independently of self-optimizing.

Behavior matrix:

self-optimizing.enabled table-summary.enabled Optimizing runs Metrics collected
true (any) Yes Yes
false true No Yes
false false (default) No No

Implementation approach:

  1. Add SELF_OPTIMIZING_TABLE_SUMMARY_ENABLED constant in TableProperties
  2. Add tableSummaryEnabled field to OptimizingConfig
  3. Parse the new property in TableConfigurations.parseOptimizingConfig()
  4. Add an else if branch in TableRuntimeRefreshExecutor.tryEvaluatingPendingInput() to collect summary metrics when optimizing is disabled but table-summary is enabled

The default value of false ensures no behavioral change for existing tables.

Usage example:

ALTER TABLE db.my_table SET TBLPROPERTIES (
  'self-optimizing.enabled'               = 'false',
  'self-optimizing.table-summary.enabled' = 'true'
);

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions