Stable Audio MLX

Text-to-audio generation using Stability AI's stable-audio-open-small model, optimized for Apple Silicon using the MLX framework.

Features

CLI Audio Generation (generate.py) - Generate audio from text prompts via command line
Interactive Sampler UI (sampler.py) - PyQt6-based keyboard sampler for real-time playback

Setup

1. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate

2. Install Dependencies

pip install -r requirements.txt

3. Get Model Access

The model requires accepting the license on Hugging Face:

Visit stable-audio-open-1.0 and accept the license (name + email required)
Visit stable-audio-open-small and accept the license

4. Login to Hugging Face

huggingface-cli login

Enter your Hugging Face access token when prompted.

5. Download and Convert Model

python src/conversion/convert.py

This will:

Download model.safetensors and model_config.json from Hugging Face
Download T5 text encoder weights
Convert weights to MLX format (model/stable_audio_small.npz)

Usage

CLI Audio Generation

Generate audio from a text prompt:

python generate.py --prompt "warm arpeggios on house beats 120BPM with drums"

Options

Option	Default	Description
`--prompt`	(required)	Text description of the audio to generate
`--negative-prompt`	`""`	Negative prompt for CFG guidance
`--seconds`	`5.0`	Audio duration in seconds
`--steps`	`8`	Inference steps (8-30 recommended)
`--cfg-scale`	`6.0`	Classifier-free guidance scale
`--seed`	random	Random seed for reproducibility
`--sampler`	`euler`	Sampler method: `euler` (faster) or `rk4` (higher quality)
`--cpu`	false	Force CPU inference

Examples

# Basic generation
python generate.py --prompt "ambient pad with reverb"

# Longer duration with more steps
python generate.py --prompt "techno kick drum loop" --seconds 10 --steps 30

# Reproducible output with seed
python generate.py --prompt "jazz piano chords" --seed 42

# Higher quality with RK4 sampler
python generate.py --prompt "orchestral strings" --sampler rk4 --steps 20

Output files are saved as {prompt}_seed_{seed}.wav (44.1kHz stereo WAV).

Interactive Sampler UI

Launch the keyboard sampler interface:

python sampler.py

Controls

Set BPM - Adjust the tempo slider (60-200 BPM)
Set Duration - Choose audio length (2-10 seconds)
Enter Prompt - Describe the sound you want
Click Generate - Wait for audio generation
Play with Keyboard - Hold keys to play, release to stop

Keyboard Layout

Key	Note	Key	Note
`a`	C4	`w`	C#4
`s`	D4	`e`	D#4
`d`	E4
`f`	F4	`t`	F#4
`g`	G4	`y`	G#4
`h`	A4	`u`	A#4
`j`	B4
`k`	C5

MIDI Input

Connect a MIDI keyboard and select it from the MIDI Input dropdown. Notes C4-C5 (MIDI notes 60-72) map to the same notes as the computer keyboard. Click Refresh to rescan for newly connected devices.

Keyboard Modes

Position Mode - Each key plays from a different position in the sample
Pitch Mode - Each key plays from the start with chromatic pitch shifting

Requirements

macOS with Apple Silicon (M1/M2/M3)
Python 3.10+
~4-6GB RAM for inference

License

Model weights are subject to Stability AI's license.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE.md		LICENSE.md
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt
sampler.png		sampler.png
sampler.py		sampler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stable Audio MLX

Features

Setup

1. Create Virtual Environment

2. Install Dependencies

3. Get Model Access

4. Login to Hugging Face

5. Download and Convert Model

Usage

CLI Audio Generation

Options

Examples

Interactive Sampler UI

Controls

Keyboard Layout

MIDI Input

Keyboard Modes

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Key	Note	Key	Note
`a`	C4	`w`	C#4
`s`	D4	`e`	D#4
`d`	E4
`f`	F4	`t`	F#4
`g`	G4	`y`	G#4
`h`	A4	`u`	A#4
`j`	B4
`k`	C5

Key	Note	Key	Note
`a`	C4	`w`	C#4
`s`	D4	`e`	D#4
`d`	E4
`f`	F4	`t`	F#4
`g`	G4	`y`	G#4
`h`	A4	`u`	A#4
`j`	B4
`k`	C5

Folders and files

Latest commit

History

Repository files navigation

Stable Audio MLX

Features

Setup

1. Create Virtual Environment

2. Install Dependencies

3. Get Model Access

4. Login to Hugging Face

5. Download and Convert Model

Usage

CLI Audio Generation

Options

Examples

Interactive Sampler UI

Controls

Keyboard Layout

MIDI Input

Keyboard Modes

Requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages

Key	Note	Key	Note
`a`	C4	`w`	C#4
`s`	D4	`e`	D#4
`d`	E4
`f`	F4	`t`	F#4
`g`	G4	`y`	G#4
`h`	A4	`u`	A#4
`j`	B4
`k`	C5