Community SDRF annotations for public proteomics datasets (ProteomeXchange and related accessions).
The SDRF specification lives in bigbio/proteomics-sample-metadata.
License: Apache 2.0 · Contributing: CONTRIBUTING.md · Agent context: llms.txt, AGENTS.md
| Resource | URL |
|---|---|
| Specification | https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/README.adoc |
| Public site | https://sdrf.quantms.org/ |
| Templates | https://github.com/bigbio/sdrf-templates |
Validator CLI (parse_sdrf) |
https://github.com/bigbio/sdrf-pipelines |
| Agentic toolkit | https://github.com/bigbio/sdrf-skills |
Files follow the pattern datasets/{ACCESSION}/{ACCESSION}.sdrf.tsv:
datasets/PXD000070/PXD000070.sdrf.tsv
datasets/MSV000078494/MSV000078494.sdrf.tsv
Additional .sdrf.tsv files may appear in the same folder when a project requires split designs.
Work-in-progress annotations live under sandbox/.
Move a folder to datasets/ and open a PR once it passes parse_sdrf validate-sdrf.
CI only validates datasets/; sandbox/ is exempt so drafts don't block merges.
Open a pull request to add or improve annotated SDRF files. See CONTRIBUTING.md for layout rules and review etiquette.
Use sdrf-skills as the primary toolkit. Key rules:
- Anchor every row in public evidence (PX page, submitted metadata, publication). Don't invent sample names or file names.
- Keep PRs small — one accession or a closely related batch.
- Run validation locally (
parse_sdrf validate-sdrf) before opening a PR. - Declare assistance in the PR description so reviewers can calibrate review depth.
For agent-specific instructions see AGENTS.md.
GitHub Actions runs parse_sdrf validate-sdrf on every PR and push touching datasets/**.
The validator is installed from bigbio/sdrf-pipelines main branch.
Re-run all checks manually via workflow_dispatch in the Actions tab.
- Dai C, Füllgrabe A, Pfeuffer J, Solovyeva EM, Deng J, Moreno P, Kamatchinathan S, Kundu DJ, George N, Fexova S, Grüning B, Föll MC, Griss J, Vaudel M, Audain E, Locard-Paulet M, Turewicz M, Eisenacher M, Uszkoreit J, Van Den Bossche T, Schwämmle V, Webel H, Schulze S, Bouyssié D, Jayaram S, Duggineni VK, Samaras P, Wilhelm M, Choi M, Wang M, Kohlbacher O, Brazma A, Papatheodorou I, Bandeira N, Deutsch EW, Vizcaíno JA, Bai M, Sachsenberg T, Levitsky LI, Perez-Riverol Y. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3. PMID: 34615866; PMCID: PMC8494749. Manuscript
- Perez-Riverol, Yasset, European Bioinformatics Community for Mass Spectrometry. "Towards a sample metadata standard in public proteomics repositories." Journal of Proteome Research (2020) Manuscript.