Fast image generation with FLUX.1 Schnell — Black Forest Labs' distilled model that generates high-quality images in just 4 steps, roughly 10x faster than SDXL. Deployed via ComfyUI's workflow-graph runtime.
GPU: 1x A10G or L4 (24GB VRAM) · Cold start: ~120s · API: Native ComfyUI (not OpenAI-compatible)
- Convox rack v3.24.6+ with GPU-capable nodes (
g5.xlargerecommended) - A HuggingFace access token (optional — avoids download rate limits; FLUX.1 Schnell is Apache 2.0, not gated)
git clone https://github.com/convox-examples/inference-examples.git
cd inference-examples/flux-schnell
convox apps create flux
convox env set HUGGING_FACE_HUB_TOKEN=hf_your_token_here -a flux
convox deploy -a fluxThe first deploy builds the Docker image and downloads FLUX.1 Schnell weights (~12GB). Subsequent deploys use cached layers.
convox services -a fluxSERVICE DOMAIN PORTS
api api.flux.org-abc123.convox.cloud 443:8188
Queue a generation (API format):
ENDPOINT=$(convox services -a flux | awk '$1 == "api" {print $2}')
# Queue a FLUX generation (4 steps — fast!)
PROMPT_ID=$(jq '{prompt: .}' workflow-api.json | \
curl -s "https://$ENDPOINT/prompt" \
-H "Content-Type: application/json" \
-d @- | jq -r '.prompt_id')
echo "Queued: $PROMPT_ID"
# Poll for completion (~5-10s for FLUX Schnell at 1024x1024)
sleep 10
curl -s "https://$ENDPOINT/history/$PROMPT_ID" | jq '.["'"$PROMPT_ID"'"].status'Custom prompt:
# Modify the workflow text and submit
jq '.["6"].inputs.text = "a cyberpunk cityscape at night, neon lights, rain" | {prompt: .}' workflow-api.json | \
curl -s "https://$ENDPOINT/prompt" \
-H "Content-Type: application/json" \
-d @- | jq .Browse the UI:
Open https://<endpoint>/ in your browser to access the full ComfyUI graph editor.
| Endpoint | Method | Purpose |
|---|---|---|
/prompt |
POST | Queue a workflow for execution |
/history/{prompt_id} |
GET | Check generation status + get outputs |
/view |
GET | Retrieve generated images |
/system_stats |
GET | GPU utilization and queue info |
/ |
GET | ComfyUI web interface |
| FLUX.1 Schnell | SDXL 1.0 | |
|---|---|---|
| Steps | 4 | 25-30 |
| Time per image (A10G) | ~5s | ~30s |
| Parameters | 12B | 3.5B |
| VRAM | ~20GB | ~12GB |
| Quality | High | High |
| License | Apache 2.0 | Open |
FLUX Schnell's distillation means 4-step inference produces quality comparable to SDXL at 25 steps.
ComfyUI uses a node-graph JSON format. Export from the ComfyUI desktop app using "Save (API Format)" — this is different from the standard save format. An example workflow-api.json is included in this directory.
The included workflow uses:
UNETLoaderfor FLUX Schnell weights (FP8 quantized)DualCLIPLoaderfor CLIP-L + T5-XXL text encodersVAELoaderfor the FLUX autoencoder- 4-step Euler sampling with
cfg=1.0(FLUX Schnell is guidance-distilled)
| Instance | GPU | VRAM | Use Case |
|---|---|---|---|
g5.xlarge |
1x A10G | 24 GB | Default — FLUX needs ~20GB VRAM |
g6.xlarge |
1x L4 | 24 GB | Alternative |
FLUX.1 Schnell requires ~20GB VRAM for the 12B parameter model. A T4 (16GB) is not sufficient.
Default config keeps one replica warm with up to 2 replicas. FLUX generates images in ~5s, so a single replica handles moderate throughput.
convox budget set flux --monthly-cap-usd 100 --at-cap-action alert-only
convox cost -a fluxEnable GPU telemetry in Rack Settings to surface per-app GPU utilization, memory, temperature, and inference throughput in the Console GPU Dashboard.
| Symptom | Cause | Fix |
|---|---|---|
429 during build |
HuggingFace download rate limit | Set HUGGING_FACE_HUB_TOKEN to avoid throttling (model is public) |
| OOM during generation | FLUX at 1024x1024 uses ~20GB | Use g5.xlarge (24GB); T4 is too small |
| Workflow JSON rejected | Exported in UI format, not API format | Re-export using Save (API Format) in ComfyUI |
| Slow first generation | Model loading into GPU on first prompt | Normal; subsequent generations are fast (~5s) |