This repo collects multimodal datasets and process them in a nice manner. Please let me know if you have some interesting datasets to be processed.
-
This dataset is from the tv series Friends. It's got visual, audio, and text modalities.
-
This dataset includes dyadic conversation between two people. There are 10 actors in total. It's got visual, audio, and text modalities.
-
This dataset is a conversation between the robot named Leolani and the human named Carl.
Every dataset has their own directory and it'll look something like below. If a dataset does not have all three modalities (i.e. visual, audio, and text), then obviously those directories will be empty.
DATASET
├── raw-videos
│ ├── train / val /test
├── raw-audios
│ ├── train / val /test
├── raw-texts
│ ├── train / val /test
├── face-videos
│ ├── train / val /test
├── face-features
│ ├── train / val /test
├── visual-features
│ ├── train / val /test
├── audio-features
│ ├── train / val /test
├── text-features
│ ├── train / val /test
├── foo.json
├── bar.json
└── README.txtEvery data sample belongs to one of the train, val, or test split. Beware that the numbers are not always the same. For example, there might be a video but then if its transcription is not available or if audio extraction fails, then it won't have text or audio modalities.
DATASETis the name of the dataset.raw-videoscontains the raw non-processed videos.raw-audioscontains the raw non-processed audios.raw-textscontains the raw non-processed texts.face-videoscontains the face videos made from the facial features.face-featurescontains the detected faces and their features using this repo and repo The features are age, bounding box, face detection probability, gender, five landmarks, and 512-dimensional arcface embedding vector.visual-featuresare other visual features (e.g. COCO 80 objects) that are not facial features.audio-featurescontains audio features (e.g. spectrogram, encoded embeddings, etc.)text-featurescontains text features (e.g. word embeddings, BERT-like model features, etc.)*.jsonare some important data-specific text files (e.g. label, etc.)README.txtbriefly explains the dataset and gives you the metadata
There several levels.
Use python >= 3.8
This is about extracting the original datasets into the above mentioned structure (i.e. raw-videos, raw-audios, raw-texts)
-
Since I don't have license to all of the datasets, you should contact the dataset authors and download them yourself.
-
Install the python requirements by
pip install -r requirements-extract-dataset.txt
I highly recommend you to run it in a virtual environment.
-
Move the archive in the corresponding directory.
- Put
MELD.Raw.tar.gzinMELD/ - Put
IEMOCAP_full_release.tar.gzinIEMOCAP/ - Put
CarLani.zipinCarLani/
- Put
-
In this current directory, where
README.mdis located, runpython extract-dataset.py --dataset DATASET
Replace
DATASETwith your desired dataset (e.g.python extract-dataset.py --dataset MELD)
In the next sections, we will go through extracting features and annotating with EMISSOR. Go ahead if you want to do it youself, otherwise you can just download them from the below links.
-
In the current repo root directory, unzip what you downloaded into the directory
./MELD/ -
In the current repo root directory, unzip what you downloaded into the directory
./IEMOCAP/ -
In the current repo root directory, unzip what you downloaded into the directory
./CarLani/
This is about extracting the featrues (e.g. face-features)
- For facial features, you will pull three docker images.
- For visual features, you should build something. I'll soon make them into docker containers.
- For audio features, you should build something. I'll soon make them into docker containers.
- For text features, you should build something. I'll soon make them into docker containers.
-
Install the python requirements by
pip install -r requirements-extract-features.txt
I highly recommend you to run it in a virtual environment.
-
In this current directory, where
README.mdis located, runpython extract-features.py --dataset DATASET --face-features --face-videos --visual-features --audio-features --text-features --run-on-gpu --num-jobs NUM_JOBS
Replace
DATASETwith your desired dataset. Only add the boolean flags (i.e. --face-features, --face-videos, --visual-features, --audio-features, --text-features) that you want to extract. For example, if you only want to extract face features and audio features from the MELD dataset, the command should bepython extract-features.py --dataset MELD --face-features --audio-features. If you want to run in parallel, you can add the gpu flag--run-on-gpuand even add more workers--num-jobs NUM_JOBS. Running on GPU requires you to have a NVIDIA GPU and you should build the GPU images for this. Read https://github.com/tae898/face for more information.
This is optional. The processed datasets can also be annotated in the EMISSOR annotation format. Using the EMISSOR annotation tool, users can visualize the data or even annotate the data by themselves.
-
Install the python requirements by
pip install -r requirements-annotate-emissor.txt
I highly recommend you to run it in a virtual environment.
-
In this current directory, where
README.mdis located, runpython annotate-emissor.py --dataset DATASET --num-jobs NUM_JOBS
The python script can only use the features extracted.
You can also download them from the below link.
-
In the current repo root directory, unzip what you downloaded into the directory
./MELD/ -
In the current repo root directory, unzip what you downloaded into the directory
./IEMOCAP/ -
In the current repo root directory, unzip what you downloaded into the directory
./CarLani/
The best way to find and solve your problems is to see in the github issue tab. If you can't find what you want, feel free to raise an issue. We are pretty responsive.
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Run
make style && make qualityin the root repo directory, to ensure code - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
If you have any questions, or have interesting datasets, then please let me know.