Make sure you have the following installed:
- Node.js (includes npm)
- Python 3.8+
- pip (Python package manager)
- TypeScript compiler
Install the TypeScript compiler globally with:
npm install -g typescriptAnd required Python packages with:
pip install -r requirements.txt
torch is included in requirements.txt, but depending on your system, you may want to install
torch separately from pytorch.org with the right CUDA version for your system.
The pipeline runs over several stages:
- Create labels using labeller script (
data/preprocessed/labels/<diff>) - Run spectrogram pipeline
- Build 3 log-mel spectrograms for each frame for 3 window sizes
- Create labels for each frame using labels from step 1
- Extract windows from spectrograms based on labels
- Export dataset in batches to
data/preprocessed/exports/<my_data>
A more detailed explanation can be found here (WIP).
- Various track sets can be found at TJA Portal.
- Track folders can be nested, but just make sure that any folder that contains a
.tjafile also has an audio file. - Most audio types should be supported. See
data/src/spectrogram_utils.pyfor supported audio types.
Supported flags:
| Flag | Type | Description |
|---|---|---|
-d |
Required | Course difficulty. See supported difficulties |
-f |
Required | Output directory name under <data>/preprocessed/exports/ |
-n |
Required | Note types (comma-separated, e.g. don,ka. See data/src/spectrogram_utils.py for supported note types.) |
-b |
Optional | Batch size; songs per dataset file (default: 50) |
-c |
Optional | Clears labels directory for the specified difficulty. |
-r |
Optional | Percentage of total samples that are background, as a decimal (default: 0.5, i.e. 50%). |
-H |
Optional | Hard negative radius in frames. Negatives are sampled within this many frames of a note event (default: 60, ~0.7s). Set to -1 to disable. |
-W |
Optional | Onset weight radius in frames. Background frames within this radius of a note onset get linearly reduced loss weight (weight = dist / radius). Positive frames always get weight 1.0 (default: 4). Set to 0 to disable. |
Example:
./data/src/build_dataset.sh -d easy -f my_dataset -n don,ka -b 50 -r 0.33Example:
import numpy as np
data = np.load(file="../preprocessed/exports/my_dataset/batch_1.npz")
X, y, W = data["X"], data["y"], data["W"]
print(X.shape) # Spectrogram windows
print(y.shape) # Spectrogram window labels
print(W.shape) # Onset weightsNote that there are multiple batch files per dataset. Load them in individually while training.
Trains a CNN on the preprocessed .npz batch files produced by the data pipeline.
python model/training.py \
--data_dir data/preprocessed/exports/my_dataset \
--out models/my_model.pth| Argument | Required | Default | Description |
|---|---|---|---|
--data_dir |
Yes | — | Directory containing batch_*.npz files and metadata.json |
--out |
Yes | — | Path to save the trained model .pth file |
--epochs |
No | 100 |
Number of training epochs |
--lr |
No | 0.001 |
Learning rate |
--batch_size |
No | 256 |
Mini-batch size |
--split_prop |
No | 0.1 |
Fraction of data held out for validation |
--dropout |
No | 0.5 |
Dropout rate on fully connected layers |
--seed |
No | 1 |
Random seed |
--patience |
No | 10 |
Early stopping patience in epochs |
--class_weights |
No | off | Weight cross entropy loss by inverse class frequency to counter class imbalance |
--onset_weights |
No | off | Use per-sample onset weights from the dataset during training |
Runs a trained model on an audio file and outputs a playable .tja chart.
python model/inference.py \
--audio path/to/song.mp3 \
--bpm 140 \
--model models/my_model.pth \
--out path/to/output.tja| Argument | Required | Default | Description |
|---|---|---|---|
--audio |
Yes | — | Path to input audio file |
--bpm |
Yes | — | BPM of the song. Songs with BPM changes mid-way will produce inaccurate charts. |
--model |
Yes | — | Path to trained model .pth file |
--out |
Yes | — | Path to write output .tja file |
--title |
No | "Untitled" |
Song title written into the TJA header |
--offset |
No | 0.0 |
Seconds of silence before the music starts in the audio file |
--threshold |
No | 0.5 |
Minimum model confidence to count as a note (0–1). Increase to reduce false positives, decrease to catch more notes. |