ComfyUI · AEON DGX Spark · Flux 2 + LTX 2.3 + ACE-Step (BF16)

Bleeding-edge ComfyUI distribution purpose-built for the NVIDIA DGX Spark (GB10 / Blackwell / sm_121a). Ships with Flux 2 Dev, LTX 2.3 22B, and ACE-Step v1.5 XL Turbo pre-staged at full BF16 quality plus NVFP4 hardware-accelerated alternates and abliterated text-encoder paths. CUDA 13.0.2 + PyTorch cu130 + SageAttention v3 compiled for sm_121a + NVFP4 (CUTLASS) hardware GEMMs.

docker pull ghcr.io/aeon-7/comfyui-aeon-spark:latest          # auto-downloads weights using your HF_TOKEN
docker pull ghcr.io/aeon-7/comfyui-aeon-spark:slim            # no auto-download — pick models via UI

🚀 Fastest path — 4 steps, ~50 minutes

Run these in order on a DGX Spark host. No prior knowledge needed.

Step 1: Get a HuggingFace token (5 min, free)

Open in browser: https://huggingface.co/settings/tokens Click "New token" → name it comfyui-aeon-spark → scope = Read → Create token → copy it. Save it for Step 3. Token starts with hf_.

Step 2: Accept 3 gated model licenses (3 click-throughs, free)

While logged in to HuggingFace, open each link below and click "Agree and access":

Without this, the downloader will fail with 403 Forbidden on those repos.

Step 3: Clone + run the interactive installer

git clone https://github.com/AEON-7/comfyui-aeon-spark.git
cd comfyui-aeon-spark
./setup.sh

When prompted: paste the HF token from Step 1. Choose :latest (default) for auto-download. The script writes .env (chmod 600), launches docker compose up, and starts the model download.

Step 4: Wait for first-start download (~45 min, 285 GB)

The container streams its progress to its log:

docker logs -f comfyui-spark

When you see Launching ComfyUI on port 8188 followed by no errors, the stack is up.

Step 5: Open ComfyUI

Browser: http://<host-ip>:8188 (use localhost if running on your local machine).

That's it. Generate an image, generate a video, generate music. Default workflows are pre-loaded under "Workflows" in the left sidebar.

🆘 Troubleshooting — symptom → exact fix

If something goes wrong, find the symptom in the table below and run the fix. Most problems have one-line solutions.

Symptom	Likely cause	Exact fix
`setup.sh: command not found` or `Permission denied`	Script not executable	`chmod +x setup.sh sync.sh && ./setup.sh`
`docker: command not found`	Docker Engine not installed	`curl -fsSL https://get.docker.com \| sh && sudo systemctl enable --now docker`
`Error response from daemon: could not select device driver "nvidia"`	NVIDIA Container Toolkit missing	`sudo apt install -y nvidia-container-toolkit && sudo systemctl restart docker`
`403 Forbidden` while downloading	HF gated models not accepted	Redo Step 2 above. Click "Agree and access" on each gated repo while logged in as the same HF user whose token you're using.
`401 Unauthorized` while downloading	HF token wrong, expired, or missing Read scope	Regenerate token at https://huggingface.co/settings/tokens with Read scope, then `nano .env`, paste new token, `docker compose restart`
ComfyUI unreachable on `:8188` from another machine	Firewall blocking port	`sudo ufw allow 8188/tcp` (or open in cloud provider's security group)
`OOM` / out-of-memory during generation	Too many models loaded simultaneously	Click "Free memory" in ComfyUI's right sidebar. If it persists, `docker compose restart`.
`disk full` mid-download	Less than 350 GB free on workspace volume	`df -h` to confirm, then either free space or set `COMFY_WORKSPACE` in `.env` to a path on a larger disk and `docker compose down && docker compose up -d`
Model download stalls at 0% for >5 minutes	HF Hub rate-limited or your network blocked	`docker logs comfyui-spark \| tail` to see exact URL, retry with `docker compose restart`
Want to update / get new workflows / new models	You already deployed once	Run `./sync.sh`. Do NOT re-clone or re-run `setup.sh` — they would re-download everything. `sync.sh` only fetches the delta.
Want to start completely fresh (different host, etc.)	Need clean state	`docker compose down -v && rm -rf workspace && ./setup.sh` (note: `-v` deletes the workspace volume — your downloaded models will be re-fetched)

For deeper agent-level troubleshooting, see §6 in AGENTS.md.

Tag matrix

Tag	Image size	What's inside	When to use
`latest` / `full` / `bf16-flux2-ltx2.3` / `cu130-sm121a`	17 GB	code + downloader; on first start the downloader pulls ~285 GB of weights into your workspace volume using your HF_TOKEN	default — you have an HF account, you just want it to work
`slim` / `base`	17 GB	code only, no auto-download	when you want to pick every model individually via the in-UI Manager, or when you want full control / fine-grained license consent

Both variants ship the same code, custom nodes, and workflows. The difference is one runs the bundled downloader on first start; the other waits for you to install models via the UI.

No image variant ever ships pre-embedded weights. That keeps every model's license cleanly the responsibility of the user pulling the file from HuggingFace under their own account. We never act as a redistributor of model weights.

License notes (read before commercial use)

Model	Where it lives	License	Notes
FLUX.2-dev	black-forest-labs/FLUX.2-dev	FLUX.2 [dev] Non-Commercial	research / non-commercial only by default
FLUX.2-klein-base-9b-fp8	black-forest-labs/FLUX.2-klein-base-9b-fp8	BFL Klein, gated	must "Agree and access" on HF before HF_TOKEN can download it
Mistral-Small-3 (Flux 2 text encoder)	Comfy-Org/flux2-dev	Mistral Research License	research use
Gemma-3 (LTX 2.3 text encoder)	Comfy-Org/ltx-2	Gemma Terms of Use	attribution + restrictions
LTX 2.3	Lightricks/LTX-2.3	Lightricks Open Weights	mostly permissive
ACE-Step v1.5	Comfy-Org/ace_step_1.5_ComfyUI_files	ACE-Step	see model card
Qwen 0.6B / 4B / 3-8B	Comfy-Org repacks of Qwen/Qwen3-*	Apache 2.0 / Qwen RL	mostly permissive
huihui-ai abliterated weights	huihui-ai	inherits parent license	derivatives

▶ QuickStart · Why DGX Spark · Hardware Compatibility · What's Bundled · Optimization Story · 🤖 AI-Agent deployment guide → AGENTS.md

Quickstart

Easiest: interactive setup (recommended)

git clone https://github.com/AEON-7/comfyui-aeon-spark.git
cd comfyui-aeon-spark
./setup.sh

The script walks you through getting an HF token, accepting the gated-model licenses, picking your image variant, and launching the stack. It hides the token as you paste it (no echo to scrollback) and writes a chmod 600 .env. Skip ahead to What's bundled once it finishes.

Already deployed? Sync new workflows + models without redeploy

cd comfyui-aeon-spark
./sync.sh

Pulls the latest image, refreshes the in-repo workflows and downloader script, shows a diff of what's new (workflows, model entries), and only fetches the delta. Idempotent — files already on disk are skipped, your Manager-installed nodes are preserved, your saved workflows untouched.

Useful flags:

./sync.sh --yes — non-interactive (for agents / cron)
./sync.sh --no-models — refresh workflows & scripts but skip model downloads (saves bandwidth)

If you'd rather do the initial deploy manually, the same steps in long form:

1. Get a HuggingFace token (5 min, free)

Required for :latest (auto-download). Optional for :slim.

Sign up / sign in at huggingface.co.
Go to Settings → Access Tokens.
Click "+ Create new token" → name it (e.g. dgx-spark) → Token type: Read → Create.
Copy the token. It looks like hf_AbCd1234....

2. Accept gated-model licenses (3 click-throughs)

A few of the bundled-by-default models are gated by their authors. Open each link, sign in, and click "Agree and access repository":

✅ FLUX.2-dev — required for workflow 01 (Flux 2 t2i)
✅ FLUX.2-klein-base-9b-fp8 — required for workflow 08 (Klein 9B)
✅ FLUX.2-small-decoder — Flux 2 VAE used by canonical templates

(Mistral, Gemma, LTX 2.3, Qwen, ACE-Step are not gated — your token can pull them right away once you sign in once.)

3. Launch

mkdir -p ~/comfyui-spark/workspace && cd ~/comfyui-spark
cat > .env <<'EOF'
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
EOF
cat > docker-compose.yml <<'EOF'
services:
  comfyui:
    image: ghcr.io/aeon-7/comfyui-aeon-spark:latest
    container_name: comfyui-spark
    runtime: nvidia
    deploy: { resources: { reservations: { devices: [{ driver: nvidia, count: all, capabilities: [gpu] }] } } }
    ports: ["8188:8188"]
    environment:
      HF_TOKEN: "${HF_TOKEN:-}"
    volumes: ["./workspace:/workspace/ComfyUI"]
    shm_size: "32gb"
    ipc: host
    ulimits: { memlock: -1, stack: 67108864 }
    restart: unless-stopped
EOF
docker compose up -d && docker compose logs -f comfyui

First start downloads ~285 GB of models. At ~95 MB/s expect ~50 minutes; look for download summary: 35 ok, 0 failed then Launching ComfyUI on port 8188. (If you skipped accepting the BFL Klein license you'll see 34 ok, 1 failed — that's expected; workflow 08 needs Klein, others don't.)

Then open http://<spark-host>:8188.

Or — `:slim` mode (no auto-download, you pick everything)

sed -i 's|comfyui-aeon-spark:latest|comfyui-aeon-spark:slim|' docker-compose.yml
docker compose up -d

ComfyUI starts in seconds with zero models on disk. Open the Manager in the top bar, click "Install Missing Models" when you load a workflow, or open the Asset Browser to install any specific model. Every download goes server-side into ./workspace/models/<directory>/ — never to your browser. Set HF_TOKEN if you plan to install gated models.

Adding more gated models later

If you load a community workflow that needs a gated model:

Open the model's HF page → click Agree and access
The same HF_TOKEN you set up earlier already works (no re-login needed)
Click Install in the ComfyUI UI

If you need to expand HF_TOKEN's permissions (e.g., the model is in an org you need access to), regenerate it on the tokens page and update .env, then docker compose up -d to pick it up.

Why this image exists / target system

Primary target: NVIDIA DGX Spark (GB10)

DGX Spark is the desktop / workstation Grace-Blackwell platform NVIDIA ships with the GB10 SoC — Grace ARM CPU + Blackwell GPU on a coherent unified-memory fabric. Spec at a glance:

Spec	Value
GPU	GB10 Blackwell
Compute capability	sm_121 / sm_121a (datacenter Blackwell variant)
Tensor cores	5th-gen with native NVFP4 support
Architecture	ARM64 (Grace) + Blackwell (GPU), coherent unified memory
Memory	128 GB LPDDR5X unified across CPU+GPU
Driver / CUDA	NVIDIA 580.x / CUDA 13.0
OS	Ubuntu 24.04 (DGX OS)

This is a different compute capability from every other Blackwell part shipping today. NVCC support for sm_121 first landed in CUDA 13.0 — neither CUDA 12.8 (max sm_120) nor any prior toolchain can emit code for it. Everything in this image was specifically built to take advantage of that:

Concern	Stock setup	This image
CUDA toolchain	CUDA 12.x (max sm_120)	CUDA 13.0.2 — first toolchain that emits sm_121 SASS
PyTorch	x86 / cu128 / no sm_121 PTX	2.9.1+cu130 ARM64, sm_120 SASS + compute_120 PTX (forward-JITs to sm_121 on first kernel call, then cached)
Attention	xformers / FA3 (no sm_121)	SageAttention v3 compiled from source for `sm_121a` + `sm_121` — no JIT cost, no fallback to slow SDPA
Memory model	discrete-GPU defaults	Grace-Blackwell unified-memory tuned — pinned pages off, async offload on, expandable segments
Triton	torch.compile crashes on sm_121	Triton present, torch.compile explicitly disabled so dynamo doesn't trip
NVFP4	not exposed	CUTLASS NVFP4 GEMMs via CUDA 13 — `*_fp4_mixed` weights take the accelerated path automatically
Manager	manual ltdrdata install	Both ltdrdata custom node + the new `comfyui-manager` pip pkg with `--enable-manager`
Models	bring your own	35 artifacts auto-pulled on first start (33 named files + 2 abliterated full-LLM snapshots): Flux 2, LTX 2.3, ACE-Step + abliterated swap-ins

What "optimized for DGX Spark" actually means here

The Spark unified-memory fabric is not like a discrete GPU + system RAM. There's one physical pool, addressable from both CPU and GPU, coherent at cacheline granularity. That changes which optimizations help and which actively hurt:

Pinned host memory hurts. Pinning host pages on a discrete GPU enables zero-copy DMA. On Grace-Blackwell, the pages are already GPU-addressable, and pinning forces an unnecessary buffer-management path. We disable it (--disable-pinned-memory).
--gpu-only hurts. It tries to keep weights "on the GPU side" that doesn't really exist as a separate place. We don't use it.
VRAM utilization caps at 0.88. Pushing past it triggers thrashing on the unified pool. We default to --reserve-vram 2.0 to leave OS scratch within the cap.
torch.compile / Inductor / dynamo are off. Triton 3.5/3.6 don't yet emit working SASS for sm_121a. Code paths that go through torch.compile JIT-fail or generate broken kernels. SageAttention covers the throughput we'd otherwise want from torch.compile.
NVFP4 happens at the GEMM level, not via a flag. When you load mistral_3_small_flux2_fp4_mixed.safetensors instead of the BF16 variant, the model's matmul ops dispatch to CUDA 13's CUTLASS NVFP4 GEMMs, which on sm_121a use 5th-gen tensor-core FP4 paths. No Marlin involved — Marlin is a Hopper SXM5 codepath that mis-fires on GB10.

This image bakes in those choices so a user typing docker compose up -d ends up on the optimal path without having to read every Spark-specific gotcha.

Hardware compatibility matrix

This image was built for sm_121a, but Blackwell SASS is forward-JIT compatible from the compute_120 PTX shipped in the PyTorch wheel — and SageAttention's sm80/sm89 fallbacks cover earlier arches. Practical behavior across NVIDIA platforms:

Platform	Compute cap	OS / Arch	Will it run?	Performance vs Spark	Notes
DGX Spark (GB10)	sm_121a	ARM64 / Ubuntu 24.04	✅ Native target	100% (reference)	Everything pre-compiled for this. SageAttention v3 hits sm_121a SASS directly, NVFP4 via CUTLASS, BF16 free.
DGX Station (anticipated GB10/GB100)	sm_121 / sm_100	ARM64	✅	~95-105%	Same generation Blackwell datacenter, near-identical paths. May want recompiled SageAttention if exact arch differs.
Jetson Thor (T5000)	sm_101 (Blackwell)	ARM64 / L4T	⚠️ Probably needs rebuild	~70-80%	ARM64 + Blackwell, but L4T toolchain quirks; SageAttention rebuild recommended for sm_101. Memory budget tighter (64 GB).
GB200 NVL (Blackwell datacenter)	sm_100a	ARM64 (Grace)	✅ with caveats	~150-200%+	Way more memory (192/384 GB HBM3e), much more compute. SageAttention's sm89 fallback works; recompile for sm_100a unlocks the full path.
B100 / B200 PCIe	sm_100a	x86 / ARM	✅ with caveats	~150-200%+	Same as above. Image is ARM64; build an x86 variant or run via QEMU/multi-arch buildx.
RTX PRO 6000 Blackwell (workstation)	sm_120	x86	✅ with rebuild	~80-100%	Same Blackwell family, sm_120 not sm_121. PTX→SASS JIT works at first run. SageAttention rebuild recommended. Image is ARM64 — pull the x86 variant or build locally.
RTX 5090 / 5080 (consumer Blackwell)	sm_120	x86	⚠️ Needs x86 rebuild	~85-95%	Same compute family, x86 ABI. Rebuild image with `--platform linux/amd64`. SageAttention v3 has sm_120 wheels.
H100 / H200 (Hopper)	sm_90	x86 / ARM	⚠️ Works, suboptimal	~60-80%	SageAttention v3 hits the proper sm_90 path; CUTLASS FP4 has a Hopper variant but it's slower than Blackwell. NVFP4 weights still dispatch to working kernels but not the 5th-gen tensor cores. Rebuild for x86.
L40S / RTX 6000 Ada (Ada Lovelace)	sm_89	x86	⚠️ Works, no NVFP4	~50-70%	SageAttention's sm89 path works for attention. NVFP4 weights fall through to BF16 path. Use BF16 variants of all models. Rebuild for x86.
A100 / A30 (Ampere)	sm_80	x86	⚠️ Works, no NVFP4, no FP8	~30-50%	SageAttention sm80 path works. No FP8/FP4 hardware support. Stick to BF16 weights, plenty of VRAM (40/80 GB) makes that fine. Rebuild for x86.
RTX 4090 / Ada workstation	sm_89	x86	⚠️ VRAM-limited	varies	Workflow files load fine; some Flux 2 / LTX 2.3 models won't fit in 24 GB without aggressive offload. Use the FP8/FP4 variants. Rebuild for x86.
RTX 3090 / Ampere workstation	sm_86	x86	⚠️ VRAM-limited, slower	varies	Similar to 4090 but no FP8 path. Strictly BF16 + offload + GGUF. Rebuild for x86.

TL;DR cross-platform

Other Grace-Blackwell systems (DGX Station, GB200, future Spark variants): pulls and runs out of the box, often faster than Spark.
Consumer Blackwell (RTX 5090/5080): great fit, just needs an x86 rebuild — docker buildx build --platform linux/amd64 -t comfyui-aeon-spark:x86 ..
Hopper / Ada / Ampere: works but progressively suboptimal — the NVFP4 hardware path is what makes Spark special here, and only Blackwell has it. Use BF16 variants on these and accept that you're not getting the 5th-gen tensor-core acceleration.
AMD / Intel / Apple Silicon: not supported. The image assumes CUDA 13 and a Blackwell-class compute capability.

What's bundled

Runtime stack

Component	Version / source
Base	`nvidia/cuda:13.0.2-devel-ubuntu24.04` (ARM64)
Python	3.12.3
PyTorch	2.9.1+cu130
Triton	3.5.1 (kept available; `torch.compile` disabled by env)
SageAttention	v3 main, compiled with `-gencode arch=compute_121a,code=sm_121a`
ComfyUI	latest `master` (0.20.1 at build time)
ComfyUI-Manager	both ltdrdata custom node + the new `comfyui-manager` pip pkg, `--enable-manager` set by default
Diffusers	0.37.1
Transformers	5.7.0
HuggingFace Hub	1.12.0 + `hf-transfer` enabled
GGUF runtime	`gguf` >= 0.13 + sentencepiece + protobuf
Total backend nodes registered	~1728 (Comfy core + 16 bundled custom-node packs)

Bundled services

The compose stack ships two services:

comfyui — main UI on :8188
ollama — LLM sidecar, auto-pulls gemma3:4b (~3 GB, swap via OLLAMA_PRELOAD_MODEL). Used by workflow 09 (AceStep audio) for prompt expansion. Reachable from comfyui as http://ollama:11434.

ollama:11434 isn't exposed to the host by default — it's an internal-only service. If you want to use it from other clients on your LAN, add ports: ["11434:11434"] to the ollama service in docker-compose.yml.

Server-side model downloads (not browser downloads!)

This image ships two independent paths that both route model downloads to the server:

Manager → Install Missing Models / Asset Browser — uses the comfyui-manager pip pkg's /v2/manager/queue/batch endpoint. Downloads land in ./workspace/models/<directory>/ on the server.
Workflow Overview → Errors → Missing Models → Download all / Download — core ComfyUI 0.20's new built-in panel. By default, core ComfyUI fires window.open(url) here, which downloads to the client machine. That's the wrong behavior for remote-accessed Sparks.
- This image bundles aeon-server-side-downloads (a JS-only custom-node pack) that intercepts those clicks and re-routes them through Manager's queue API. After the intercept, you'll see a toast: "Queueing N file(s) for server-side download…". The file lands in your workspace volume on the server.
- Falls back transparently to the browser download if Manager isn't reachable, so it never breaks worse than upstream.

--enable-assets and --enable-manager are on by default. Sources of model URLs (read in this order):

properties.models[] arrays on workflow loader nodes (canonical Comfy templates have these wired)
download_models.py runs at first start to pre-fetch the bundled set (:latest only)
ComfyUI Manager's catalog (browse → install for any community model)

Bundled ComfyUI custom node packs

Pack	Why
ComfyUI-Manager (ltdrdata)	classic node/model manager
ComfyUI-LTXVideo (Lightricks)	94 official LTX-2 nodes
ComfyUI-GGUF (city96)	GGUF text encoders + DiTs
ComfyUI_essentials (cubiq)	image utilities
rgthree-comfy	workflow ergonomics (48 nodes)
ComfyUI-Custom-Scripts (pythongosssss)	favorites, autocomplete, etc.
ComfyUI-KJNodes (kijai)	huge collection incl. GetNode/SetNode virtual links
ComfyUI-Frame-Interpolation (Fannovel16)	RIFE / FILM video upscaling
ComfyUI-Crystools	on-canvas perf monitor
ComfyUI-Easy-Use (yolain)	simplified flux/ltx/sd flows
ComfyUI-RES4LYF (ClownsharkBatwing)	advanced samplers including the `ClownSampler_Beta` family — required by Lightricks's distilled LTX workflow
ComfyUI-VideoHelperSuite (Kosinkadink)	video I/O nodes
ComfyUI-Ollama (stavsap)	Ollama LLM-prompting nodes (used by ACE-Step Ancient_Sufi workflow)
ComfyUI-Detail-Daemon (Jonseed)	`MultiplySigmas`, `LyingSigmaSampler`, etc.
aeon-server-side-downloads (in-tree)	JS-only extension that intercepts the new ComfyUI 0.20+ "Workflow Overview → Missing Models → Download" buttons and routes the download server-side via Manager's queue API, so files land in your workspace volume on the server, not in your browser on the client machine. Critical for remote-accessed Sparks.
ComfyUI-PromptRelay (kijai)	Timeline-based per-second prompt control for video — change descriptions throughout the sequence (used by `10_ltx2.3_prompt_relay`).

Models auto-downloaded on first start (~285 GB)

Flux 2 Dev (Black Forest Labs / Comfy-Org pre-split)

DiT (flux2_dev_fp8mixed.safetensors, 35.5 GB)
Two VAEs (flux2-vae.safetensors, full_encoder_small_decoder.safetensors)
Mistral-3 Small text encoder in BF16 (35.6 GB) and NVFP4-mixed (12.3 GB)
Two Turbo LoRAs (canonical + alt filename)

LTX 2.3 22B (Lightricks)

BF16 transformer-only DiT (42 GB) and FP8 transformer-only DiT (23.5 GB)
FP8 fused checkpoint (29 GB) — used by Lightricks's canonical workflows
Text projection layer, video VAE, audio VAE, tiny preview VAE
Dynamic distilled LoRA + canonical 384-rank distilled LoRA
Spatial upscaler x2 v1.1 + temporal upscaler x2 v1.0

Gemma-3 (LTX 2.3 text encoder, Comfy-Org split)

Gemma-3 12B IT in BF16 (24.4 GB) and NVFP4-mixed (9.4 GB)

ACE-Step v1.5 (Ancient_Sufi audio-generation workflow)

ACE-Step XL Turbo DiT BF16 (9.97 GB)
Qwen 0.6B + Qwen 4B text encoders
1D audio VAE

Abliterated text-encoder paths

Two abliterated LoRAs for the Gemma encoder (heretic + alt)
Two full HF-format snapshots for direct swap-in: huihui-ai Mistral-Small-3.2 24B abliterated + huihui-ai Gemma-3 12B IT abliterated
Set SKIP_ABLITERATED=1 to skip the ~70 GB snapshots and only fetch the smaller LoRA path.

Default workflows seeded into `user/default/workflows/`

File	What it does
`01_flux2_text_to_image.json`	Comfy canonical Flux 2 Dev t2i (subgraph workflow)
`02_ltx2.3_T2V_I2V_distilled.json`	Lightricks's official LTX-2.3 single-stage distilled T2V/I2V (uses `ClownSampler_Beta`, abliterated Gemma LoRA, FP8 checkpoint, distilled-lora-384, both upscalers)
`03_ltx2.3_T2V_two_stage.json`	Lightricks's two-stage T2V (cleaner motion)
`04_ltx2.3_image_to_video.json`	Comfy canonical LTX-2.3 I2V
`05_ltx2.3_first_last_frame_to_video.json`	Comfy canonical LTX-2.3 first-frame/last-frame-to-video
`07_ltx2.3_id_lora.json`	Comfy canonical LTX-2.3 with identity-LoRA wiring
`08_flux2_klein_9b_text_to_image.json`	Flux 2 Klein 9B variant t2i
`09_acestep_ancient_sufi_xl.json`	ACE-Step v1.5 XL Turbo audio with Ollama prompt-expansion
`10_ltx2.3_prompt_relay.json`	LTX 2.3 22B distilled-1.1 fp8 + Kijai's ComfyUI-PromptRelay — timeline-based per-second prompt control for video

Optimization story

Compile-time work that's already been done

SageAttention v3 compiled inside the image with explicit -gencode=arch=compute_121a,code=sm_121a -gencode=arch=compute_121,code=sm_121. This produces Blackwell-datacenter SASS for _qattn_sm80, _qattn_sm89, and _fused. Zero JIT cost on first generation.
PyTorch 2.9.1+cu130 ships sm_120 SASS plus compute_120 PTX. On Spark the PTX gets JIT-compiled to sm_121a SASS the first time a kernel runs, then cached in ~/.nv/ComputeCache. Forward-compat is the path NVIDIA recommends for pre-release silicon.
CUDA 13.0.2 toolchain in the build image is the first NVCC release that emits sm_121 — CUDA 12.x literally cannot.
All 16 custom-node requirements.txt resolved at build time, so you don't pay the dependency-resolve tax on every container start.

Runtime tuning that ships by default

Knob	Setting	Why
`--use-sage-attention`	on	fastest sm_121a attention path
`--bf16-unet --bf16-vae --bf16-text-enc`	on	Spark's 128 GB unified pool means BF16 is free; NVFP4 weights still take their hardware path automatically when loaded
`--disable-pinned-memory`	on	Grace-Blackwell coherent fabric performs worse with pinned host pages
`--reserve-vram 2.0`	2 GB	leaves OS scratch on the unified pool — Spark caps utilization at 0.88
`--enable-manager`	on	wires the new in-frontend Manager dialog to the `comfyui-manager` pip pkg
`--enable-cors-header`	on	external clients (mobile UIs, automation) can hit the API
`TORCH_COMPILE_DISABLE=1`	on	Triton doesn't yet emit working SASS for sm_121a
`CUDA_MODULE_LOADING=EAGER`	on	avoids the lazy-load stall ComfyUI hits on first model swap
`PYTORCH_ALLOC_CONF=expandable_segments:True`	on	reduces fragmentation when juggling 35 GB DiTs
`CUDA_DEVICE_MAX_COPY_CONNECTIONS=4`	tuned	matches the GB10 copy-engine count

Why no FlashAttention

FA2 / FA3 / FA4 don't ship sm_121 kernels (FA4 only does sm_100; FA2/3 stop at sm_90). SageAttention v3 covers the same surface plus quantized variants the FA family doesn't have. The image deliberately omits FlashAttention rather than waste size on a wheel that would silently fall back to PyTorch SDPA at runtime.

Why NVFP4 is automatic

There's no "enable NVFP4" flag. The weights are FP4 — when ComfyUI's CLIPLoader (or UNETLoader) loads *_fp4_mixed.safetensors, the matmul ops dispatch to CUDA 13's CUTLASS NVFP4 GEMMs, which on sm_121a use the 2nd-gen tensor-core FP4 path.

To switch any workflow from BF16 (best quality) to NVFP4 (max throughput), swap one widget on the loader from e.g. mistral_3_small_flux2_bf16.safetensors → mistral_3_small_flux2_fp4_mixed.safetensors.

Persistent volume layout

workspace/                           ← single host-mounted volume
├── models/                          ← 285 GB of pre-staged weights
│   ├── diffusion_models/            ← Flux 2 + LTX 2.3 + ACE-Step DiTs
│   ├── checkpoints/                 ← LTX 2.3 FP8 fused checkpoint
│   ├── text_encoders/               ← Mistral, Gemma, Qwen
│   │   └── abliterated/             ← + huihui-ai full HF dirs
│   ├── vae/  loras/  latent_upscale_models/
│   └── ... all standard ComfyUI subdirs
├── custom_nodes/                    ← 16 bundled + anything Manager adds
├── output/                          ← generated images, videos, audio
├── input/                           ← reference inputs
├── user/default/workflows/          ← 8 pre-seeded workflows
└── temp/                            ← scratch

Wipe the container, rebuild the image, mount the same workspace/, and everything boots in seconds with the same models, settings, Manager-installed nodes, and saved workflows.

Tuning cheat sheet

Goal	Switch to	Why
Maximum quality	leave defaults — BF16 path is default	Spark unified memory is plentiful
Maximum throughput	swap CLIPLoader's encoder file from `_bf16.safetensors` → `_fp4_mixed.safetensors`	takes the CUTLASS NVFP4 GEMM path on sm_121a 2nd-gen tensor cores
Fewer-step Flux 2	drop in `Flux2TurboComfyv2.safetensors` LoRA, set steps to 4–8	Turbo LoRA is pre-staged
Fast LTX 2.3	use `02_ltx2.3_T2V_I2V_distilled` as-is — loads `ltx-2.3-22b-distilled-lora-384` at 8 steps	bundled
No abliteration (LTX 2.3)	bypass the `LoraLoader` for `gemma-3-12b-it-abliterated_*` in workflow 02	one click
Audio generation	open `09_acestep_ancient_sufi_xl`	ACE-Step + Ollama prompt expansion

What's not included (and why)

xformers — no sm_121 wheel exists; deliberately skipped on ARM64.
FlashAttention 2/3/4 — no sm_121 support yet; SageAttention v3 covers the same surface.
bitsandbytes — depends on FA-style kernels not on sm_121; replace with GGUF (already bundled) or NVFP4 weights (already bundled).
TensorRT engines — RT engines aren't portable across compute capabilities; building them inside the container would defeat the cold-start-ready goal. Run TRT engine builds yourself if you need them.
Frontend node editor extras (3D / animation suites) — install via Manager so they live in your volume, not the image.

Adding more workflows

Drop the .json into ~/comfyui-spark/workspace/user/default/workflows/ — no rebuild, no restart. The UI auto-discovers it on the next browser refresh.

To bundle a workflow as a default for future fresh starts of this image, fork this repo, drop the file into workflows/, and rebuild — the incremental rebuild touches only one COPY layer (~5–15 seconds).

Adding more custom nodes

Use the in-UI Manager (button in the top bar). Installs land in workspace/custom_nodes/ and survive container recreations.

Adding more models

Drop files into the appropriate workspace/models/<subdir>/. ComfyUI's loaders auto-rescan — refresh the loader's dropdown in the UI.

Updating ComfyUI

docker compose pull             # grab the latest :latest tag
docker compose up -d            # recreate; volume keeps everything

Sharing the GPU with vLLM

Spark has a single GPU. Stop one before starting the other:

docker stop vllm-aeon-ultimate-v2 && docker compose up -d   # ComfyUI
# or
docker compose down && docker start vllm-aeon-ultimate-v2   # back to vLLM

Repo layout

├── Dockerfile           # multi-stage build with SageAttention compile
├── docker-compose.yml   # tuned for Grace-Blackwell unified memory
├── entrypoint.sh        # workspace bootstrap + model downloader + launch
├── download_models.py   # 28-artifact resumable downloader
├── workflows/           # 8 UI-format .json files baked into the image
├── workflows/api/       # API-format .json workflows (for scripted generation)
│   ├── ltx_t2v_pure.json   # pure text-to-video (no image conditioning)
│   ├── ltx_i2v_api.json   # image-to-video with chaining support
│   └── ltx_t2v_api_fixed.json  # pre-corrected API workflows
├── prompts/             # curated LTX prompt library
│   └── ltx_spiraling_library.txt  # production-tested prompt examples
├── skills/              # AI agent skill files for movie production
│   ├── ltx-movie-studio/    # end-to-end chain production pipeline
│   ├── ltx-scenarist/       # prompt expansion + LTX prompt craft
│   ├── ltx-director/        # orchestration + shot sequencing
│   ├── ltx-cameraman/       # generation execution wrapper
│   └── comfyui-spark-ltx/   # ComfyUI API operator skill
├── .env.example         # HF_TOKEN + tuning flags
├── README.md            # this file
├── AGENTS.md            # deployment guide for AI agents (Claude, Copilot, etc.)
├── WRITEUP.md           # extended writeup (more detail than README)
└── QUICKSTART.md        # 3-command run + troubleshooting

Deploying via an AI agent

If you're handing this to an AI agent (Claude, Copilot, Cursor, etc.) to deploy on a Spark you have SSH access to, point it at AGENTS.md. It's structured top-to-bottom with pre-flight checks, single-block deployment commands, post-deploy validation, exact-fix matrices for common failures, hard "do not" guardrails, and a standard report-back template.

🎬 AI Movie Studio — LTX Video Production on DGX Spark

This section documents the AI movie studio skills package built on top of the AEON-7 ComfyUI container — a complete pipeline for producing multi-segment, temporally-chained AI videos using LTX Video 2.3.

Bottom line: LTX 2.3 generates ~5 seconds per segment. For longer videos, you chain segments together by using the last frame of each segment as the seed image for the next. The result is a coherent, continuous movie.

The Problem This Solves

LTX 2.3 is a powerful video generation model, but it has real constraints:

Constraint	Impact
~5 seconds per generation	Can't generate a 60-second scene in one shot
No scene memory	Each segment is independent — lighting, character, camera can shift
Static first frame	Each segment's first ~1.3s shows the seed image before moving
Complex physics failures	Characters pass through thin barriers
Audio is silent	LTX generates video only

The movie studio skills package works around these through segment chaining, careful prompt engineering, and post-production assembly.

What We Built — and What It Does

The 5-Skill Suite

The package is organized like a film crew, with separate skills for each role:

Human Producer
     │
     ▼
ltx-director        ── orchestrates the whole production, writes shotlist
     │
     ├──▶ ltx-scenarist      ── expands "beach party scene" into full LTX prompt
     │
     └──▶ ltx-cameraman ────── executes generation via comfyui-spark-ltx
                                     │
                                     ▼
                              ComfyUI / LTX 2.3
                                     │
                                     ▼
                              .mp4 clip file

ltx-movie-studio — The master skill. End-to-end production from concept to finished movie. Calls the other skills automatically. Use this if you want the full pipeline.

ltx-scenarist — Expands simple scene descriptions into full LTX prompts. Teaches the 6-element prompt structure: shot scale, scene, action, characters, camera movement, audio. Includes physical emotion cues, lighting reference, and camera language.

ltx-director — Takes a shotlist and orchestrates generation. Manages chaining logic (extract last frame → use as next seed), shot ordering, and continuity.

ltx-cameraman — Thin wrapper that delegates to comfyui-spark-ltx for actual ComfyUI API calls.

comfyui-spark-ltx — The operator skill. Submits workflows to ComfyUI, polls for completion, copies outputs. Supports pure T2V (no image), I2V (with seed image), and chained modes.

Quick Start — Generate Your First Video

1. Verify ComfyUI is Running

import requests
r = requests.get("http://localhost:8188/system_stats", timeout=5)
print(r.json())
# Expected: {"version": "0.20.1", "devices": [{"name": "NVIDIA GB10"}]}

2. Generate a Clip

import json, requests, uuid, subprocess

HOST = "http://localhost:8188"
WF_PATH = "/path/to/repo/workflows/api/ltx_t2v_pure.json"  # pure T2V
OUTPUT_DIR = "/path/to/outputs/"

with open(WF_PATH) as f:
    wf = json.load(f)

wf["2483"]["inputs"]["text"] = (
    "Wide shot, cinematic -- a breathtaking tropical beach at golden hour. "
    "Crystal clear turquoise water gently laps against pristine white sand. "
    "Palm trees sway in a soft breeze. Warm golden sunlight bathes everything. "
    "The audio: rhythmic waves, distant laughter, seagulls calling."
)
wf["2612"]["inputs"]["text"] = "blurry, low quality, distorted, deformed, ugly, bad anatomy"

client_id = str(uuid.uuid4())
r = requests.post(f"{HOST}/prompt", json={"prompt": wf, "client_id": client_id}, timeout=30)
prompt_id = r.json()["prompt_id"]

# Poll (takes ~2-5 min on GB10)
import time
for _ in range(120):
    time.sleep(10)
    r = requests.get(f"{HOST}/history/{prompt_id}", timeout=10)
    if r.status_code == 200 and prompt_id in r.json():
        if r.json()[prompt_id]["status"]["status_str"] == "success":
            print("Done!")
            break

# Copy from container
subprocess.run([
    "docker", "cp",
    "comfyui-spark:/workspace/ComfyUI/output/output_00001_.mp4",
    f"{OUTPUT_DIR}/my_clip.mp4"
])

3. Or — Use the Hermite Agent Skills Directly

If you're running this via Hermes Agent (or another agent framework that supports skills), simply say:

"Make me a 30-second tropical beach movie"

The ltx-movie-studio skill handles everything: shotlist, generation, chaining, trim, concat, and audio.

API Workflow Files

Three API-ready workflows are provided in workflows/api/:

File	Use When
`ltx_t2v_pure.json`	Opening shot only — pure text-to-video, no image conditioning. Both I2V nodes are bypassed.
`ltx_i2v_api.json`	All chained segments — feed a seed image in. Pre-fixed for API use (see below).
`ltx_t2v_api_fixed.json`	Reference copy with all API fixes documented inline.

Required API Fixes (already applied to the provided workflows)

When using the UI-format workflows via the ComfyUI API, three nodes need explicit fixes:

1. LoadImage node (ID 2004) — The image widget must be set to an actual filename present in the container's workspace/input/ directory:

wf["2004"]["inputs"]["image"] = "my_seed_image.png"

2. COMFY_DYNAMICCOMBO_V3 / ResizeImageMaskNode (ID 4010) — This node is broken when called via API. Pre-replaced in the provided workflows with a standard ImageScale node (ID 9990, 1536×1536, lanczos).

3. LTXVPreprocess (ID 3336) — Must include the img_compression widget set to 3:

wf["3336"]["inputs"]["img_compression"] = 3

The Chaining Protocol (Critical for Coherent Videos)

LTX has no scene memory. Each generation is independent. To make a coherent multi-segment video, we chain segments: the last frame of segment N becomes the seed image for segment N+1.

S01: T2V (no seed) ──▶ video_s01.mp4
      │
      │ ffmpeg -y -sseof -0.1 -i video_s01.mp4 -frames:v 1 -q:v 2 /tmp/frame_s01.png
      │ docker cp /tmp/frame_s01.png comfyui-spark:/workspace/ComfyUI/input/segment_01.png
      ▼
S02: I2V (seed=segment_01.png, strength=1.0) ──▶ video_s02.mp4
      │
      │ ffmpeg -y -sseof -0.1 -i video_s02.mp4 -frames:v 1 -q:v 2 /tmp/frame_s02.png
      ▼
S03: I2V (seed=segment_02.png, strength=1.0) ──▶ video_s03.mp4
      ... repeat ...

Chaining Parameters

Parameter	Value	Why
`strength` on I2V node	1.0	Maximum continuity — seed image fully determines first frame
ImageScale size	1536×1536	Matches latent aspect ratio
Frame extract timing	`-sseof -0.1`	0.1s before end to get the last real frame
Container input path	`/workspace/ComfyUI/input/`	Where seed images must live

Post-Production Pipeline

After all segments are generated and concatenated:

Step 1: Trim Static First Frames ⚠️

Every LTX segment has ~1.3 seconds of static first frame — the seed image lingers before motion begins. You must remove this from ALL segments (including S01):

for i in 01 02 03 04 05 06 07 08 09 10; do
  ffmpeg -y -ss 1.3 -i beach_s${i}.mp4 -c copy beach_s${i}_trimmed.mp4
done

After trim: 5.0s → 3.7s usable per segment. 10 segments ≈ 37 seconds.

Step 2: Crop to 16:9

LTX outputs 1920×1088. Crop the extra 8 pixels:

ffmpeg -y -i concat_trimmed.mp4 \
  -vf "crop=1920:1080:0:4" \
  -c:v libx264 -preset medium -crf 18 \
  video_1080p.mp4

Step 3: Audio

Option A — MiniMax Music API (recommended, cleanest)

Generate a full music track via the MiniMax API:

# Requires: pip install requests
python3 << 'EOF'
import json, codecs, urllib.request, os

# Get key from ~/.hermes/.env (your agent's environment)
with open(os.path.expanduser("~/.hermes/.env")) as f:
    for line in f:
        if line.startswith("MINIMAX_API_KEY="):
            key = line.strip().split("=", 1)[1].strip()
            break

url = "https://api.minimax.io/v1/music_generation"
payload = {
    "model": "music-2.6",  # NOT music-2.6-free (unsupported)
    "prompt": "Reggae tropical beach bar Bob Marley style upbeat steel drums bass guitar happy vibes",
    "is_instrumental": True,
    "stream": True,
    "audio_setting": {"sample_rate": 44100, "bitrate": 256000, "format": "mp3"}
}

req = urllib.request.Request(url, data=json.dumps(payload).encode(),
    headers={"Content-Type": "application/json", "Authorization": f"Bearer {key}"},
    method="POST")

with urllib.request.urlopen(req, timeout=300) as resp:
    for chunk in resp:
        text = chunk.decode("utf-8", errors="replace")
        for line in text.split("\n"):
            if line.startswith("data: "):
                obj = json.loads(line[6:])
                if obj.get("data", {}).get("audio"):
                    audio_hex = obj["data"]["audio"]
                    audio_bytes = codecs.decode(audio_hex, "hex")
                    with open("/tmp/reggae_music.mp3", "wb") as f:
                        f.write(audio_bytes)
                    print("Saved 160s reggae track!")
                    break
EOF

Key notes:

Model must be music-2.6 — music-2.6-free returns {"error": {"message": "not supported on your current plan"}}
Generation takes ~120–240 seconds even with streaming mode
Generate a track 60–180s long (longer than your movie), then trim + fade

Option B — Synthesize Ambient Sound (requires scipy)

If you want layered scene audio (waves, birds, crowd), use the hermes venv Python which has scipy:

# Run via: /home/a/.hermes/hermes-agent/venv/bin/python
# execute_code sandbox does NOT have scipy
from scipy.signal import butter, filtfilt

⚠️ Known audio issue: Without scipy, the crude RC filter approximation used in execute_code produces audible white noise and ticking artifacts. Use MiniMax music only, or use the hermes venv Python with scipy.

Step 4: Mux and Deliver

# Mix audio
ffmpeg -y -i video_1080p.mp4 -i mixed_audio.aac \
  -c:v copy -c:a aac -b:a 192k -shortest output_final.mp4

# Compress for Telegram/sharing
ffmpeg -y -i output_final.mp4 \
  -c:v libx264 -preset fast -crf 22 -b:v 2M \
  -c:a aac -b:a 128k \
  output_compressed.mp4

Known Problems — Help Wanted

We're documenting these openly because they'd benefit from community expertise:

🔴 Problem 1: Static Seed Image Lingers at Segment Start

What happens: Every LTX segment shows ~1.3 seconds of the seed image as a static first frame before motion begins. We remove this with ffmpeg -ss 1.3, but that wastes ~26% of each segment.

What's been tried:

Reducing I2V strength — does NOT remove the image influence (the pipeline is structurally I2V; only bypass=True on the conditioning nodes removes it, but then you get no image guidance at all)
Using pure T2V for all segments — eliminates the seed image issue, but loses visual continuity between segments

Looking for help with:

Finding a way to suppress the static first-frame without losing I2V continuity benefits
Understanding whether this is a ComfyUI-LTXVideo node behavior or inherent to the LTX model
Alternative chaining strategies that don't waste 1.3s per segment

🔴 Problem 2: Audio Synthesis Produces White Noise and Ticking

What happens: Synthesized ambient audio (ocean waves, birds, crowd noise) using numpy-only filtering in the execute_code sandbox produces audible white noise and a regular ticking sound.

What's been tried:

RC lowpass filter approximation — produces the ticking artifact
Various filter parameters — artifact persists

Looking for help with:

A clean ambient sound synthesis approach that works in the constrained environment
The MiniMax music API workaround is functional but requires an API key and adds 2–4 minutes of generation time
Ideally: a scipy-equivalent filter accessible from the sandbox, or a better ambient synthesis approach

🟡 Problem 3: Narrow / Compressed People in Some Segments

What happens: Some segments show people that look unnaturally thin or horizontally compressed. This appears to be related to the 1920×1088 (non-16:9) output aspect ratio — the latent space is 960×544, which decodes to 1920×1088.

What's been tried:

Cropping to 1920×1080 helps slightly but doesn't fix the compression in the latent itself

Looking for help with:

Understanding whether this is a latent decoding issue or a model generation issue
Whether adjusting latent dimensions could fix the aspect ratio

LTX Prompting Guide

See skills/ltx-scenarist/SKILL.md for the full guide. Key points:

6 elements every LTX prompt must include:

Shot scale — EWS, WS, MS, CU, ECU
Scene — specific location, time of day, lighting, atmosphere
Action — specific physical action in present tense, simple physics
Characters — appearance + physical emotion cues (NOT "sad" — write "shoulders slump, eyes cast down")
Camera — movement type relative to subject, described in natural language
Audio — ambient sounds, music, dialogue

Golden rules:

Physical cues over emotion labels ("her eyes narrow" not "she's suspicious")
Simple single-threaded physics (no complex multi-object collision)
Natural camera language ("camera follows her" not "dolly 2ft right at 30deg/sec")
Be detailed — more description = better output
1–3 characters per shot

Segment Length Compensation

To get more usable content after the 1.3s trim, generate more latent frames:

Latent Frames	Output (~24fps)	After 1.3s Trim	Node 3059 `"length"`
121 (default)	~5.0s	~3.7s	`"length": 121`
161 (+40)	~6.7s	~5.4s	`"length": 161`
201 (+80)	~8.4s	~7.1s	`"length": 201`

Edit workflow node 3059 (EmptyLTXVLatentVideo) to change "length" from 121.

Calling the Skills from Hermes Agent

If you have a Hermes Agent setup, copy the skills/ directory to ~/.hermes/skills/creative/:

cp -r /path/to/repo/skills/* ~/.hermes/skills/creative/

Then tell the agent:

"I want to make a 45-second tropical beach movie with a beach bar scene and a bonfire"

The agent will load ltx-movie-studio, create a shotlist, expand each prompt via ltx-scenarist, generate clips via comfyui-spark-ltx, chain them, assemble the movie, and compose audio.

Contributing

This is experimental. The LTX model, ComfyUI nodes, and our techniques are all evolving. Contributions welcome:

Prompt engineering — better prompts, new scene types, camera movement recipes
Post-production — cleaner audio synthesis, ffmpeg workflows, music generation
ComfyUI node expertise — better understanding of the I2V conditioning mechanics
LTX model tricks — longer segments, better motion quality, physics fixes

Open an issue or PR. Please read the LTX Video documentation and the ComfyUI-LTXVideo node reference before proposing changes to the workflow structure.

License

MIT. The AEON-7 ComfyUI distribution and movie studio skills are open source.

Model licenses remain with their authors:

LTX 2.3 — Lightricks Open Weights
FLUX.2-dev — FLUX.2 Non-Commercial
MiniMax Music API — subject to MiniMax terms of service

MIT. Bundled custom-node packs and model weights retain their respective upstream licenses (Apache 2.0 / MIT / FLUX Non-Commercial / etc). The Flux 2 Dev model is under Black Forest Labs's Non-Commercial license — review before commercial use.

Build / push reference (only if you're forking)

The published image at ghcr.io/aeon-7/comfyui-aeon-spark is the canonical artifact and is what docker compose pull grabs. If you want to fork and publish your own variant under a different namespace:

git clone https://github.com/AEON-7/comfyui-aeon-spark.git
cd comfyui-aeon-spark

# docker compose build tags the local image as
# ghcr.io/aeon-7/comfyui-aeon-spark:latest (per docker-compose.yml).
# Re-tag and push under your own namespace:
docker compose build              # ~3 min on Spark with ccache hot
docker tag ghcr.io/aeon-7/comfyui-aeon-spark:latest \
           ghcr.io/<your-namespace>/comfyui-aeon-spark:custom
docker push ghcr.io/<your-namespace>/comfyui-aeon-spark:custom

For an x86 fork (RTX 5090/5080 consumer Blackwell):

DOCKER_BUILDKIT=1 docker buildx build --platform linux/amd64 \
  --build-arg TORCH_CUDA_ARCH_LIST="12.0" \
  -t ghcr.io/<your-namespace>/comfyui-aeon-spark:cu130-x86 .

Built and maintained for the DGX Spark AI workstation. Pairs naturally with vllm-aeon-ultimate for LLM serving on the same hardware.

☕ Support the work

If this release has been useful, tips are deeply appreciated — they go directly toward more compute, more models, and more open releases.

₿ Bitcoin (BTC) _{bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4}	Ξ Ethereum (ETH) _{0x1512667F6D61454ad531d2E45C0a5d1fd82D0500}
◎ Solana (SOL) _{DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t}	ⓜ Monero (XMR) _{836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd}

Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens can be sent to the same Ethereum address.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github		.github
aeon-server-side-downloads		aeon-server-side-downloads
prompts		prompts
skills		skills
workflows		workflows
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
Dockerfile.slim		Dockerfile.slim
QUICKSTART.md		QUICKSTART.md
README.md		README.md
WRITEUP.md		WRITEUP.md
docker-compose.yml		docker-compose.yml
download_models.py		download_models.py
entrypoint.sh		entrypoint.sh
setup.sh		setup.sh
sync.sh		sync.sh

Folders and files

Latest commit

History

Repository files navigation

ComfyUI · AEON DGX Spark · Flux 2 + LTX 2.3 + ACE-Step (BF16)

🚀 Fastest path — 4 steps, ~50 minutes

Step 1: Get a HuggingFace token (5 min, free)

Step 2: Accept 3 gated model licenses (3 click-throughs, free)

Step 3: Clone + run the interactive installer

Step 4: Wait for first-start download (~45 min, 285 GB)

Step 5: Open ComfyUI

🆘 Troubleshooting — symptom → exact fix

Tag matrix

License notes (read before commercial use)

Quickstart

Easiest: interactive setup (recommended)

Already deployed? Sync new workflows + models without redeploy

1. Get a HuggingFace token (5 min, free)

2. Accept gated-model licenses (3 click-throughs)

3. Launch

Or — :slim mode (no auto-download, you pick everything)

Adding more gated models later

Why this image exists / target system

Primary target: NVIDIA DGX Spark (GB10)

What "optimized for DGX Spark" actually means here

Hardware compatibility matrix

TL;DR cross-platform

What's bundled

Runtime stack

Bundled services

Server-side model downloads (not browser downloads!)

Bundled ComfyUI custom node packs

Models auto-downloaded on first start (~285 GB)

Flux 2 Dev (Black Forest Labs / Comfy-Org pre-split)

LTX 2.3 22B (Lightricks)

Gemma-3 (LTX 2.3 text encoder, Comfy-Org split)

ACE-Step v1.5 (Ancient_Sufi audio-generation workflow)

Abliterated text-encoder paths

Default workflows seeded into user/default/workflows/

Optimization story

Compile-time work that's already been done

Runtime tuning that ships by default

Why no FlashAttention

Why NVFP4 is automatic

Persistent volume layout

Tuning cheat sheet

What's not included (and why)

Adding more workflows

Adding more custom nodes

Adding more models

Updating ComfyUI

Sharing the GPU with vLLM

Repo layout

Deploying via an AI agent

🎬 AI Movie Studio — LTX Video Production on DGX Spark

The Problem This Solves

What We Built — and What It Does

The 5-Skill Suite

Quick Start — Generate Your First Video

1. Verify ComfyUI is Running

2. Generate a Clip

3. Or — Use the Hermite Agent Skills Directly

API Workflow Files

Required API Fixes (already applied to the provided workflows)

The Chaining Protocol (Critical for Coherent Videos)

Chaining Parameters

Post-Production Pipeline

Step 1: Trim Static First Frames ⚠️

Step 2: Crop to 16:9

Step 3: Audio

Step 4: Mux and Deliver

Known Problems — Help Wanted

🔴 Problem 1: Static Seed Image Lingers at Segment Start

🔴 Problem 2: Audio Synthesis Produces White Noise and Ticking

🟡 Problem 3: Narrow / Compressed People in Some Segments

LTX Prompting Guide

Segment Length Compensation

Calling the Skills from Hermes Agent

Contributing

License

Or — `:slim` mode (no auto-download, you pick everything)

Default workflows seeded into `user/default/workflows/`

Packages