Build: Docker layer granularity optimization plan - 1#425
Build: Docker layer granularity optimization plan - 1#425yuhuan130 wants to merge 10 commits intoflyteorg:mainfrom
Conversation
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
src/flyte/_image.py
Outdated
| categorized[category].append(pkg) | ||
|
|
||
| # Helper function to create a layer | ||
| def create_pip_layer(pkgs): |
There was a problem hiding this comment.
Image.from_debian_base(name="test").with_pip_packages("tensorflow").with_pip_packages("pytorch")In this case, TensorFlow and Pytorch will be installed in different layers, right? Let's install them in the same layer.
Also, I think it might be better to add _optimize method in ImageBuildEngine, and we can do
image = ImageBuildEngine._optimize(image)
result = await img_builder.build_image(image, dry_run=dry_run)flyte-sdk/src/flyte/_internal/imagebuild/image_builder.py
Lines 209 to 210 in 55d1b63
It will be cleaner, wdyt?
There was a problem hiding this comment.
Great idea. I've updated your suggestion on my latest commit!!
src/flyte/_image.py
Outdated
| pre: bool = False, | ||
| extra_args: Optional[str] = None, | ||
| secret_mounts: Optional[SecretRequest] = None, | ||
| optimize_layers: bool = True, |
There was a problem hiding this comment.
Could we move it to clone()
Image.from_debian_base().clone(optimize_layers=True)
src/flyte/_image.py
Outdated
| "heavy": ("tensorflow", "torch", "torchaudio", "torchvision", "scikit-learn"), | ||
| # -----------------[ MIDDLE ]----------------- # | ||
| # Layer 1: ~200MB | Rebuild cost: Med | Freq: Low | ||
| "core": ("numpy", "pandas", "pydantic", "requests", "httpx", "boto3", "fastapi", "uvicorn"), |
There was a problem hiding this comment.
I just tried to build these two images; the build times are almost the same. Do we really need this core layer?
image = (
Image.from_debian_base(install_flyte=False)
.with_apt_packages("vim", "wget")
.with_pip_packages("pandas", "numpy")
)image = (
Image.from_debian_base(install_flyte=False)
.with_apt_packages("vim", "wget")
.with_pip_packages("pandas", "numpy", "ty")
)There was a problem hiding this comment.
I think the similar build times make sense here, without the additional optimize_layers flag marked false. Both images reuse the cached apt + pandas/numpy layers, so the only real installing is ty, which is small. Technically it'll save around 45 seconds - 1 minute.
There was a problem hiding this comment.
Hmm interesting, I tried running it locally once with and without optimize, and I found out python:3.12-slim-bookworm base actually pre-installs the wheels for pandas and numpy etc... so it'll be fast for these two packages either way.
examples/benchmark.py
Outdated
|
|
||
|
|
||
| @env.task | ||
| async def main(): |
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
| # Step 1: Collect heavy packages and build new layer list | ||
| all_heavy_packages: list[str] = [] | ||
| template_layer: PipPackages | None = None | ||
| optimized_layers: list[Layer] = [] |
There was a problem hiding this comment.
nit: could we call it original_layers or other_layers?
| template_layer = PipPackages( | ||
| packages=(), | ||
| index_url=layer.index_url, |
There was a problem hiding this comment.
I think we should have heavy_layers: List. each layers has different settings
| ImageBuilderType = typing.Literal["local", "remote"] | ||
|
|
||
| @staticmethod | ||
| def _optimize_image_layers(image: Image) -> Image: |
There was a problem hiding this comment.
Could we also add some unit tests for it, thanks
Signed-off-by: “Alex <alexchien130@gmail.com>
Signed-off-by: “Alex <alexchien130@gmail.com>
What This Does
Optimizes Docker build cache by separating heavy ML dependencies (torch, tensorflow) into their own layers and moving Flyte Python wheels to the bottom, so adding lightweight packages doesn't trigger full rebuilds.
Performance
Heavy Benchmark (torch, tensorflow, transformers)
Standard Benchmark (torch, numpy, pandas)
How It Works
Key Changes
heavy_deps.py: New centralized configuration file for heavy dependencies (tensorflow, torch, scikit-learn, etc.)image_builder.py: Enhanced_optimize_image_layers()method with:optimize_layers=True(enabled by default, can disable)Files Changed
src/flyte/_internal/imagebuild/heavy_deps.py(new)src/flyte/_internal/imagebuild/image_builder.pysrc/flyte/_image. pytests/flyte/test_image. pyexamples/image_layer_optimize/(3 benchmark files)Total: 7 files (1 new, 6 modified)
Screenshots
Replaces #422 with cleaner git history