First, have the OpusCleaner installed on your system.
Then, clone this repository and install the additional requirements (at this
point it's only urwid beyond what you already need to install to get a
working install of OpusCleaner)
Set up the DATA_PATH (and perhaps the SAMPLE_SIZE) environment variables
(these are used by OpusCleaner as usual). Then, run the app with ./main.py.
For example:
export DATA_PATH='/home/helcl/hplt/translation-models/en-cs/*.*.gz'
export SAMPLE_SIZE=100
cd path/to/clianer/
./main.pyMost of the controls are listed in the bottom bar of the app frame. However, there are some other controls depending the current application focus. Move focus between filter view and dataset view using left and right arrow.
These work independently or whether focus is in the filter view or in the dataset view.
- F2 opens up a new dataset
- F3 adds a new filter
- F6 show clean version of the data in the dataset view
- F7 assign categories to current dataset
- F10, q exit the application
- Down, Up move within the focused window (PgUp and PgDn also work)
- F4 edit filter
- F5 import filter pipeline from a different dataset (careful, this overwrites whatever is the current pipeline)
- F8 remove filter
- w, s move selected filter up or down
- d mark filter for diffing
- r reset diffing
- F4 show diff (select which filter steps to diff in the filter view)
- F5 show clean version of the data
This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee grant number 10052546. The contents of this publication are the sole responsibility of the HPLT consortium and do not necessarily reflect the opinion of the European Union.
