anyscale · xyuzh · Aug 24, 2025 · Aug 25, 2025 · Sep 12, 2025 · Nov 3, 2025
diff --git a/image_processing/Dockerfile b/image_processing/Dockerfile
@@ -0,0 +1,14 @@
+FROM anyscale/ray:2.51.1-slim-py312-cu128
+
+# C compiler for Triton’s runtime build step (vLLM V1 engine)
+# https://github.com/vllm-project/vllm/issues/2997
+RUN sudo apt-get update && \
+    sudo apt-get install -y --no-install-recommends build-essential
+
+RUN curl -LsSf https://astral.sh/uv/install.sh | sh
+
+RUN uv pip install --system huggingface_hub boto3
+
+RUN uv pip install --system vllm==0.11.0
+
+RUN uv pip install --system transformers==4.57.1
diff --git a/image_processing/README.md b/image_processing/README.md
@@ -0,0 +1,55 @@
+# Large-Scale Image Processing with Vision Language Models
+
+This example demonstrates how to build a production-ready image processing pipeline that scales to billions of images using Ray Data and vLLM on Anyscale. We process the [ReLAION-2B dataset](https://huggingface.co/datasets/laion/relaion2B-en-research-safe), which contains over 2 billion image URLs with associated metadata.
+
+## What This Pipeline Does
+
+The pipeline performs three main stages on each image:
+
+1. **Parallel Image Download**: Asynchronously downloads images from URLs using aiohttp with 1,000 concurrent connections, handling timeouts and validation gracefully.
+
+2. **Image Preprocessing**: Validates, resizes, and standardizes images to 128×128 JPEG format in RGB color space using PIL, filtering out corrupted or invalid images.
+
+3. **Vision Model Inference**: Runs the Qwen2.5-VL-3B-Instruct vision-language model using vLLM to generate captions or analyze image content, scaling across up to 64 L4 GPU replicas based on workload.
+
+The entire pipeline is orchestrated by Ray Data, which handles distributed execution, fault tolerance, and resource management across your cluster.
+
+## Key Features
+
+- **Massive Scale**: Processes 2B+ images efficiently with automatic resource scaling
+- **High Throughput**: Concurrent downloads (1,000 connections) and batched inference (8 images per batch, 16 concurrent batches per GPU)
+- **Fault Tolerant**: Gracefully handles network failures, invalid images, and transient errors
+- **Cost Optimized**: Automatic GPU autoscaling (up to 64 L4 replicas) based on workload demand
+- **Production Ready**: Timestamped outputs, configurable memory limits, and structured error handling
+
+## How to Run
+
+First, make sure you have the [Anyscale CLI](https://docs.anyscale.com/get-started/install-anyscale-cli) installed.
+
+You'll need a HuggingFace token to access the ReLAION-2B dataset. Get one at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens).
+
+Submit the job:
+
+```bash
+anyscale job submit -f job.yaml --env HF_TOKEN=$HF_TOKEN
+```
+
+Or use the convenience script:
+
+```bash
+./run.sh
+```
+
+Results will be written to `/mnt/shared_storage/process_images_output/{timestamp}/` in Parquet format.
+
+## Configuration
+
+The pipeline is configured for high-throughput processing:
+
+- **Compute**: Up to 530 CPUs and 64 L4 GPUs (g6.xlarge workers) with auto-scaling
+- **Vision Model**: Qwen2.5-VL-3B-Instruct on NVIDIA L4 GPUs with vLLM
+- **Download**: 1,000 concurrent connections, 5-second timeout per image
+- **Batch Processing**: 50 images per download batch, 8 images per inference batch
+- **Output**: 100,000 rows per Parquet file for efficient storage
+
+You can adjust these settings in `process_images.py` and `job.yaml` to match your requirements.
diff --git a/image_processing/job.yaml b/image_processing/job.yaml
@@ -0,0 +1,42 @@
+# View the docs https://docs.anyscale.com/reference/job-api#jobconfig.
+name: process-images
+
+# When empty, use the default image. This can be an Anyscale-provided base image
+# like anyscale/ray:2.43.0-slim-py312-cu125, a user-provided base image (provided
+# that it meets certain specs), or you can build new images using the Anyscale
+# image builder at https://console.anyscale-staging.com/v2/container-images.
+# image_uri:  # anyscale/ray:2.43.0-slim-py312-cu125
+containerfile: ./Dockerfile
+
+# When empty, Anyscale will auto-select the instance types. You can also specify
+# minimum and maximum resources.
+compute_config:
+  # Pin worker nodes to g6e.12xlarge so the vision workload lands on L40S GPUs.
+  worker_nodes:
+    - instance_type: g5.12xlarge
+      min_nodes: 0
+      max_nodes: 16
+  max_resources:
+    CPU: 768
+    GPU: 64
+
+# Path to a local directory or a remote URI to a .zip file (S3, GS, HTTP) that
+# will be the working directory for the job. The files in the directory will be
+# automatically uploaded to the job environment in Anyscale.
+working_dir: .
+
+# When empty, this uses the default Anyscale Cloud in your organization.
+cloud:
+
+env_vars:
+  RAY_DEFAULT_OBJECT_STORE_MEMORY_PROPORTION: "0.5"
+
+# The script to run in your job. You can also do "uv run main.py" if you have a
+# pyproject.toml file in your working_dir.
+entrypoint: python process_images.py
+
+# If there is an error, do not retry.
+max_retries: 0
+
+# Kill the job after 2 hours to control costs.
+timeout_s: 7200