Skip to content

Yaomister/taiko

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

115 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taiko

Prerequisites

Make sure you have the following installed:

  • Node.js (includes npm)
  • Python 3.8+
  • pip (Python package manager)
  • TypeScript compiler

Install the TypeScript compiler globally with:

npm install -g typescript

And required Python packages with:

pip install -r requirements.txt

torch is included in requirements.txt, but depending on your system, you may want to install torch separately from pytorch.org with the right CUDA version for your system.

Data pipeline

Overview

The pipeline runs over several stages:

  1. Create labels using labeller script (data/preprocessed/labels/<diff>)
  2. Run spectrogram pipeline
    1. Build 3 log-mel spectrograms for each frame for 3 window sizes
    2. Create labels for each frame using labels from step 1
    3. Extract windows from spectrograms based on labels
  3. Export dataset in batches to data/preprocessed/exports/<my_data>

A more detailed explanation can be found here (WIP).

Usage

1. Add your songs into data/tracks/

  • Various track sets can be found at TJA Portal.
  • Track folders can be nested, but just make sure that any folder that contains a .tja file also has an audio file.
  • Most audio types should be supported. See data/src/spectrogram_utils.py for supported audio types.

2. Run the dataset builder script

Supported flags:

Flag Type Description
-d Required Course difficulty. See supported difficulties
-f Required Output directory name under <data>/preprocessed/exports/
-n Required Note types (comma-separated, e.g. don,ka. See data/src/spectrogram_utils.py for supported note types.)
-b Optional Batch size; songs per dataset file (default: 50)
-c Optional Clears labels directory for the specified difficulty.
-r Optional Percentage of total samples that are background, as a decimal (default: 0.5, i.e. 50%).
-H Optional Hard negative radius in frames. Negatives are sampled within this many frames of a note event (default: 60, ~0.7s). Set to -1 to disable.
-W Optional Onset weight radius in frames. Background frames within this radius of a note onset get linearly reduced loss weight (weight = dist / radius). Positive frames always get weight 1.0 (default: 4). Set to 0 to disable.

Example:

./data/src/build_dataset.sh -d easy -f my_dataset -n don,ka -b 50 -r 0.33

3. Import .npz file for each batch

Example:

import numpy as np

data = np.load(file="../preprocessed/exports/my_dataset/batch_1.npz")
X, y, W = data["X"], data["y"], data["W"]

print(X.shape) # Spectrogram windows
print(y.shape) # Spectrogram window labels
print(W.shape) # Onset weights

Note that there are multiple batch files per dataset. Load them in individually while training.

Model training

Trains a CNN on the preprocessed .npz batch files produced by the data pipeline.

Usage

python model/training.py \
  --data_dir data/preprocessed/exports/my_dataset \
  --out models/my_model.pth

Arguments

Argument Required Default Description
--data_dir Yes Directory containing batch_*.npz files and metadata.json
--out Yes Path to save the trained model .pth file
--epochs No 100 Number of training epochs
--lr No 0.001 Learning rate
--batch_size No 256 Mini-batch size
--split_prop No 0.1 Fraction of data held out for validation
--dropout No 0.5 Dropout rate on fully connected layers
--seed No 1 Random seed
--patience No 10 Early stopping patience in epochs
--class_weights No off Weight cross entropy loss by inverse class frequency to counter class imbalance
--onset_weights No off Use per-sample onset weights from the dataset during training

Inference

Runs a trained model on an audio file and outputs a playable .tja chart.

Usage

python model/inference.py \
  --audio path/to/song.mp3 \
  --bpm 140 \
  --model models/my_model.pth \
  --out path/to/output.tja

Arguments

Argument Required Default Description
--audio Yes Path to input audio file
--bpm Yes BPM of the song. Songs with BPM changes mid-way will produce inaccurate charts.
--model Yes Path to trained model .pth file
--out Yes Path to write output .tja file
--title No "Untitled" Song title written into the TJA header
--offset No 0.0 Seconds of silence before the music starts in the audio file
--threshold No 0.5 Minimum model confidence to count as a note (0–1). Increase to reduce false positives, decrease to catch more notes.

About

taiko audio transcriber

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors