Skip to content

vvsbiocode/sdrf-annotated-datasets

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sdrf-annotated-datasets

License: Apache 2.0 llms.txt Datasets Sandbox Validate SDRF datasets

Community SDRF annotations for public proteomics datasets (ProteomeXchange and related accessions). The SDRF specification lives in bigbio/proteomics-sample-metadata.

License: Apache 2.0 · Contributing: CONTRIBUTING.md · Agent context: llms.txt, AGENTS.md

Key links

Resource URL
Specification https://github.com/bigbio/proteomics-sample-metadata/blob/master/sdrf-proteomics/README.adoc
Public site https://sdrf.quantms.org/
Templates https://github.com/bigbio/sdrf-templates
Validator CLI (parse_sdrf) https://github.com/bigbio/sdrf-pipelines
Agentic toolkit https://github.com/bigbio/sdrf-skills

Dataset layout

Files follow the pattern datasets/{ACCESSION}/{ACCESSION}.sdrf.tsv:

datasets/PXD000070/PXD000070.sdrf.tsv
datasets/MSV000078494/MSV000078494.sdrf.tsv

Additional .sdrf.tsv files may appear in the same folder when a project requires split designs.

Sandbox

Work-in-progress annotations live under sandbox/. Move a folder to datasets/ and open a PR once it passes parse_sdrf validate-sdrf. CI only validates datasets/; sandbox/ is exempt so drafts don't block merges.

Contributing

Open a pull request to add or improve annotated SDRF files. See CONTRIBUTING.md for layout rules and review etiquette.

Agent-assisted annotation

Use sdrf-skills as the primary toolkit. Key rules:

  • Anchor every row in public evidence (PX page, submitted metadata, publication). Don't invent sample names or file names.
  • Keep PRs small — one accession or a closely related batch.
  • Run validation locally (parse_sdrf validate-sdrf) before opening a PR.
  • Declare assistance in the PR description so reviewers can calibrate review depth.

For agent-specific instructions see AGENTS.md.

CI validation

GitHub Actions runs parse_sdrf validate-sdrf on every PR and push touching datasets/**. The validator is installed from bigbio/sdrf-pipelines main branch. Re-run all checks manually via workflow_dispatch in the Actions tab.

How to cite

  • Dai C, Füllgrabe A, Pfeuffer J, Solovyeva EM, Deng J, Moreno P, Kamatchinathan S, Kundu DJ, George N, Fexova S, Grüning B, Föll MC, Griss J, Vaudel M, Audain E, Locard-Paulet M, Turewicz M, Eisenacher M, Uszkoreit J, Van Den Bossche T, Schwämmle V, Webel H, Schulze S, Bouyssié D, Jayaram S, Duggineni VK, Samaras P, Wilhelm M, Choi M, Wang M, Kohlbacher O, Brazma A, Papatheodorou I, Bandeira N, Deutsch EW, Vizcaíno JA, Bai M, Sachsenberg T, Levitsky LI, Perez-Riverol Y. A proteomics sample metadata representation for multiomics integration and big data analysis. Nat Commun. 2021 Oct 6;12(1):5854. doi: 10.1038/s41467-021-26111-3. PMID: 34615866; PMCID: PMC8494749. Manuscript
  • Perez-Riverol, Yasset, European Bioinformatics Community for Mass Spectrometry. "Towards a sample metadata standard in public proteomics repositories." Journal of Proteome Research (2020) Manuscript.

About

A repository of datasets in ProteomeXchange annotated in SDRF, previously part of the specification

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%