This repository contains data on RiPP precursor peptides. The dataset summarizes knowledge about of experimentally validated (true-positive) RiPP precursor peptides, including their cleavage sites.
This dataset was used in the training of the decRiPPter2 classifier, but can be used by any other project.
The basis of this dataset was sourced from MIBiG (Minimum Information about a Biosynthetic Gene Cluster), cleaned, gap-filled, and structured using a JSON schema also available through this repository.
The aim of this dataset it to make data on RiPP precursor peptides freely available, FAIR, and sustainably maintained.
Please consider contributing RiPP precursor data and growing the data repository!
For more information, see the DecRiPPter2 Organization page.
RiPPs (ribosomally synthesized and post-translationally modified peptides) are metabolites with strong biological activities. Their biosynthesis involves a precursor peptide, which is modified by a number of tailoring enzymes, and eventually cleaved to yield the mature core peptide. RiPP classes are defined by the biosynthetic logic of their tailoring enzymes and are therefor rarely homologous. This makes rule-based discovery of new RiPP classes challenging and favours machine learning-based approaches.
This dataset reports RiPP precursor peptide data in a structured, machine- and human-readable format. Most importantly, the provenance of the sequences is documented by providing a reference to the original publication.
Each data entry describes a RiPP BGC, containing:
- ≥ 1 entry for a precursor peptide
- ≥ 1 literature reference
- (optional) database cross-references to the BGC
- (optional) the compound name of the mature RiPP product(s)
- (optional) the RiPP class (controlled vocabulary)
Thank you for considering to contribute to this dataset! We are always welcoming experimental data on precursor peptides. Please consider the following conditions:
- RiPP precursors must be experimentally validated (no predictions)
- At least one literature reference must be provided
For the technical aspects of contributing, see CONTRIBUTING.
decrippter2_data has functionality to validate the data structure of its content against the provided JSON Schema.
Data validation is automatically triggered upon a pull request via GitHub Actions. If you want to trigger it manually, please take the following steps.
- Install
hatchusing one of the methods described here - Download or clone this repository
- Run
hatch -v env create. This will download and install the appropriate Python version and any required packages - Run the data validation on a single or multiple files using
hatch run d2_validate -i [input1.json input2.json ... inputN.json ]
- Install
hatchusing one of the methods described here - Download or clone this repository
- Run
hatch -v env create. This will download and install the appropriate Python version and any required packages - Run
hatch run pre-commit install. This will set uppre-commit - Run the tests with
hatch run pytest - If necessary, remove the environment with
hatch env remove dev