[Fix #491] : Add cache cleanup follow up #714

Harshit28j · 2025-11-24T15:10:08Z

This PR is follow up PR related to this issue #678 #491

Summary by CodeRabbit

Release Notes

New Features
- Added new "clean" setup mode that automatically removes temporary Docker images, generated script files, and cached OpenML datasets and tasks following benchmark completion to optimize disk space and system resources
Improvements
- Refined caching mechanisms and modernized the package installation process to enhance performance and efficiency

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-24T15:10:26Z

Walkthrough

A new "clean" setup mode is introduced across the benchmarking system to perform cleanup operations including docker image removal, script deletion, and cache cleanup. The pip installation mechanism is refactored to use subprocess-based invocation, and caching is updated from cache to lru_cache. The mode is exposed via CLI and integrated into the container runner.

Changes

Cohort / File(s)	Change Summary
New Setup Mode Definition `amlb/benchmark.py`, `runbenchmark.py`	Added `SetupMode.clean = 5` enum member; extended CLI `--setup` choices to include "clean" alongside existing options
Container Cleanup Logic `amlb/runners/container.py`	Added setup_mode tracking in setup method; enhanced cleanup to conditionally execute (docker image removal, script deletion, OpenML cache cleanup) when SetupMode.clean is active; stored task_defs in run for cleanup access; added os and run_cmd imports
Caching Strategy Update `amlb/resources.py`	Replaced `cache` with `lru_cache(maxsize=None)` import; updated `constraint_definition` method decorator from `@cache` to `@lru_cache(maxsize=None)`
Pip Installation Refactoring `amlb/utils/modules.py`	Replaced dynamic import-based pip access with subprocess-based invocation (sys.executable -m pip install --no-cache-dir); removed register_module and register_submodule helper functions; changed exception handling from SystemExit to subprocess.CalledProcessError

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (runbenchmark.py)
    participant Runner as ContainerBenchmark
    participant Cleanup as Cleanup Operations
    participant Docker as Docker
    participant Cache as OpenML Cache

    CLI->>Runner: setup(setup_mode="clean")
    activate Runner
    Runner->>Runner: Record setup_mode = SetupMode.clean
    deactivate Runner

    CLI->>Runner: run(tasks)
    activate Runner
    Runner->>Runner: Store task_defs for later use
    deactivate Runner

    CLI->>Runner: cleanup()
    activate Runner
    Note over Runner: Check if setup_mode == SetupMode.clean
    alt SetupMode.clean
        Runner->>Cleanup: Initiate conditional cleanup
        activate Cleanup
        
        Cleanup->>Docker: Remove built docker image
        Docker-->>Cleanup: Image removed (if exists)
        
        Cleanup->>Cleanup: Delete generated script file
        
        Cleanup->>Cache: Clean OpenML caches<br/>(for each task_def)
        Cache-->>Cleanup: Caches cleared
        
        deactivate Cleanup
    else Other modes
        Note over Runner: Skip cleanup
    end
    deactivate Runner

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Cleanup logic with error handling: The container cleanup implementation includes conditional branching, per-item error handling with logging, and multiple resource cleanup operations that require verification of correctness
Setup mode state tracking: Verify that setup_mode is correctly captured and propagated from CLI through setup to cleanup phases
Subprocess refactoring in modules.py: The pip installation changes replace dynamic imports with subprocess calls; requires careful review of command construction, error handling migration (SystemExit → CalledProcessError), and removal of helper functions
Task definition storage: Ensure task_defs are properly stored and accessible during cleanup without unintended side effects

Poem

🐰 A setup mode called "clean" takes the stage,
Sweeping docker images, scripts—all the rage!
With subprocess pipes and caches so tight,
The benchmarks run fresh, oh what a delight! 🧹✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title '[Fix #491] : Add cache cleanup follow up' directly corresponds to the main changes in the PR: adding a new 'clean' setup mode for cache cleanup across multiple files.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

for more information, see https://pre-commit.ci

Harshit28j · 2025-11-24T15:13:09Z

Manually tested after building docker image

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

runbenchmark.py (1)
130-137: Add documentation for the new "clean" setup option.

The new "clean" setup mode has been added to the choices but is not documented in the help text (lines 132-137). Users running --help won't know what this option does.

Based on the changes in amlb/runners/container.py, the help text should describe that "clean" mode removes Docker images, generated scripts, and OpenML caches. Consider adding a bullet point like:
     help="Framework/platform setup mode. Available values are:"
     "\n• auto: setup is executed only if strictly necessary."
     "\n• skip: setup is skipped."
     "\n• force: setup is always executed before the benchmark."
     "\n• only: only setup is executed (no benchmark)."
+    "\n• clean: performs cleanup operations (removes Docker images, scripts, and caches)."
     "\n(default: '%(default)s')",

🧹 Nitpick comments (1)

amlb/utils/modules.py (1)
19-23: Simplify error logging.

The current implementation uses logging.error followed by logging.exception(e). Since logging.exception automatically captures exception information, you can simplify this to a single call without passing the exception object explicitly.

Apply this diff:
-    except subprocess.CalledProcessError as e:
-        log.error(
-            "Error when trying to install python modules %s.", module_or_requirements
-        )
-        log.exception(e)
+    except subprocess.CalledProcessError:
+        log.exception(
+            "Error when trying to install python modules %s.", module_or_requirements
+        )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfe8d21 and f6a8499.

📒 Files selected for processing (5)

amlb/benchmark.py (1 hunks)
amlb/resources.py (2 hunks)
amlb/runners/container.py (4 hunks)
amlb/utils/modules.py (1 hunks)
runbenchmark.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-03-13T18:54:25.520Z

Learnt from: PGijsbers
Repo: openml/automlbenchmark PR: 700
File: amlb/data.py:131-132
Timestamp: 2025-03-13T18:54:25.520Z
Learning: The `clear_cache` function in the automlbenchmark library explicitly handles `cached_property` decorated attributes by removing them from the object's __dict__, but for other cache decorators like `cache`, it depends on the presence of a `cache_clear` method.

Applied to files:

amlb/resources.py

🪛 Ruff (0.14.5)

amlb/resources.py

174-174: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

amlb/runners/container.py

116-116: Do not catch blind exception: Exception

(BLE001)

amlb/utils/modules.py

18-18: subprocess call: check for execution of untrusted input

(S603)

20-22: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

23-23: Redundant exception object included in logging.exception call

(TRY401)

🔇 Additional comments (9)

amlb/resources.py (2)

15-15: LGTM: Import change aligns with cache clearing requirements.

The change from cache to lru_cache is appropriate given the PR's focus on cache cleanup. lru_cache provides the cache_clear() method needed for cleanup operations.

Based on learnings, the clear_cache function in the codebase depends on the presence of a cache_clear method for cache decorators.

174-174: Acceptable use of lru_cache on method despite static analysis warning.

The static analysis tool flags this as potentially causing memory leaks (B019). However, since Resources is a singleton class (see __INSTANCE__ at line 329), only one instance will ever exist, which mitigates the memory leak concern. The change to lru_cache enables the cache_clear() method needed for the cleanup functionality introduced in this PR.

amlb/utils/modules.py (1)

11-18: LGTM: Subprocess-based pip invocation is more robust.

The refactoring from internal pip APIs to subprocess-based invocation is a best practice, as internal pip APIs are not guaranteed to be stable. The command construction is safe and uses sys.executable to ensure the correct Python environment.

The S603 static analysis warning about untrusted input can be safely dismissed here, as this is an internal utility function where module_or_requirements comes from controlled sources within the benchmarking framework.

amlb/benchmark.py (1)

68-68: LGTM: Clean enum addition.

The new clean setup mode is properly added to the SetupMode enum with sequential numbering, consistent with the existing pattern.

amlb/runners/container.py (5)

12-12: LGTM: Import additions support cleanup functionality.

The new imports (os and run_cmd) are appropriately added to support the cleanup operations introduced in this PR.

Also applies to: 21-21

83-83: LGTM: Setup mode tracking enables conditional cleanup.

Storing the setup mode allows the cleanup method to perform different actions based on the chosen mode, which is essential for the new clean functionality.

99-101: Consider the implications of forced Docker image removal.

The cleanup uses docker rmi -f with the force flag, which will remove the image even if containers are using it. While this is appropriate for a cleanup mode, ensure this behavior aligns with user expectations, especially if multiple benchmarking sessions could be running concurrently.

If concurrent benchmark runs are possible, verify that forced image removal won't disrupt other active sessions.

105-117: Appropriate error handling for cleanup operations.

The per-task error handling with warning logs is well-designed for cleanup operations. The broad Exception catch (flagged by static analysis) is acceptable here because:

Cleanup should be resilient and continue even if individual operations fail

Warnings are logged so failures aren't silent

The lazy import of openml only loads the library when needed

122-122: LGTM: Task definitions stored for cleanup.

Storing the task definitions enables the cleanup method to iterate over tasks and clean their OpenML caches. The comment correctly notes this also validates the tasks.

Add cache cleanup follow up

2ed57c8

[pre-commit.ci] auto fixes from pre-commit.com hooks

f6a8499

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Fix #491] : Add cache cleanup follow up #714

[Fix #491] : Add cache cleanup follow up #714

Uh oh!

Harshit28j commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 24, 2025 •

edited

Loading

Uh oh!

Harshit28j commented Nov 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Fix #491] : Add cache cleanup follow up #714

Are you sure you want to change the base?

[Fix #491] : Add cache cleanup follow up #714

Uh oh!

Conversation

Harshit28j commented Nov 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Harshit28j commented Nov 24, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Harshit28j commented Nov 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2025 •

edited

Loading