The heaviest task we want to perform on the input data frame consists on appending audio snippets.
This involves opening a .wav file for each row.
Some of these rows points to the same .wav file, so we'll make sure the file is opened only once.
from pandas import read_csvdf = read_csv("tests/test_public.csv")
df| start_time | end_time | participant | utterance | key | language | uid | |
|---|---|---|---|---|---|---|---|
| 0 | 2.0 | 2.9 | pablo | unit test | /public-dutch/dutch-01 | dutch | dutch-public-000-000-000001 |
| 1 | 1.2 | 2.5 | pablo | prueba de audio | /public-spanish/spanish-01 | spanish | spanish-public-000-000-000001 |
| 2 | 1.9 | 2.9 | pablo | los tests | /public-spanish/spanish-02 | spanish | spanish-public-000-000-000002 |
| 3 | 1.9 | 2.9 | none | nothing | /missing_file | klingon | klingon-000-000-000001 |
| 4 | 10.2 | 12.5 | pablo | out of bounds | /public-spanish/spanish-wrong | spanish | spanish-public-000-000-000003 |
Please note the times are in seconds.
This adapter will help us converting our syntax (using keys) into librosa's syntax (using filenames).
key = "/public-dutch/dutch-01"from corpusparser.auxs import filename_from_key
filename_from_key(key)'data/public-dutch/dutch-01.wav'
from corpusparser.parsers import *audio_from_key(key)array([0. , 0. , 0. , ..., 0.00112915, 0.00177002,
0.00216675], dtype=float32)
samplerate_from_key(key)24000
df[df["key"] == key].reset_index()| index | start_time | end_time | participant | utterance | key | language | uid | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 2.0 | 2.9 | pablo | unit test | /public-dutch/dutch-01 | dutch | dutch-public-000-000-000001 |
snippet = subset_audio_from_key(df, key, row=0)
snippet/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:51: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
start_i = floor(start_time * rate)
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:52: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
end_i = ceil(end_time * rate)
array([-1.5258789e-04, -6.1035156e-05, 1.2207031e-04, ...,
1.5258789e-04, 2.1362305e-04, 1.2207031e-04], dtype=float32)
df = extend_dataframe(df)/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:22: UserWarning: PySoundFile failed. Trying audioread instead.
audio, rate = librosa.core.load(filename_from_key(key), sr=sr, **kwargs) # sr=None uses the native sampling rate
/home/pablo/miniconda3/envs/ffmpeg-test/lib/python3.12/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
Deprecated as of librosa version 0.10.0.
It will be removed in librosa version 1.0.
y, sr_native = __audioread_load(path, offset, duration, dtype)
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:25: UserWarning: Something went wrong with key: /missing_file
warnings.warn(f"Something went wrong with key: {key}")
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:32: FutureWarning: PySoundFile failed. Trying audioread instead.
Audioread support is deprecated in librosa 0.10.0 and will be removed in version 1.0.
sr = librosa.get_samplerate(filename_from_key(key), **kwargs)
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:34: UserWarning: Something went wrong with key: /missing_file
warnings.warn(f"Something went wrong with key: {key}")
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:32: FutureWarning: PySoundFile failed. Trying audioread instead.
Audioread support is deprecated in librosa 0.10.0 and will be removed in version 1.0.
sr = librosa.get_samplerate(filename_from_key(key), **kwargs)
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:34: UserWarning: Something went wrong with key: /missing_file
warnings.warn(f"Something went wrong with key: {key}")
df| start_time | end_time | participant | utterance | key | language | uid | audio | rate | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2.0 | 2.9 | pablo | unit test | /public-dutch/dutch-01 | dutch | dutch-public-000-000-000001 | [-0.00015258789, -6.1035156e-05, 0.00012207031... | 24000 |
| 1 | 1.2 | 2.5 | pablo | prueba de audio | /public-spanish/spanish-01 | spanish | spanish-public-000-000-000001 | [-0.0009460449, -0.00076293945, -0.00076293945... | 16000 |
| 2 | 1.9 | 2.9 | pablo | los tests | /public-spanish/spanish-02 | spanish | spanish-public-000-000-000002 | [0.0066223145, 0.007019043, 0.0073547363, 0.00... | 16000 |
| 3 | 1.9 | 2.9 | none | nothing | /missing_file | klingon | klingon-000-000-000001 | [] | 0 |
| 4 | 10.2 | 12.5 | pablo | out of bounds | /public-spanish/spanish-wrong | spanish | spanish-public-000-000-000003 | [] | 16000 |
from corpusparser.listeners import *
listen_audio_from_key(df, key = key, row = 0)/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:51: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
start_i = floor(start_time * rate)
/home/pablo/code/ffmpeg-test/corpusparser/parsers.py:52: FutureWarning: Calling float on a single element Series is deprecated and will raise a TypeError in the future. Use float(ser.iloc[0]) instead
end_i = ceil(end_time * rate)
listen_snippet_from_df(df, row = 0)