Skip to content
This repository was archived by the owner on Dec 19, 2023. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
49d60a0
init dev branch
stolarczyk May 28, 2020
fa6074f
changes related to DB switch to postgres
stolarczyk Sep 3, 2020
9c8422d
insert sample-specific metadata as jsonb column
stolarczyk Sep 3, 2020
010006e
piface naming, remove old
stolarczyk Sep 8, 2020
79dd00e
save plots metadata separately
stolarczyk Sep 8, 2020
7f5b729
Update README.md
stolarczyk Sep 10, 2020
35e51c0
Update README.md
stolarczyk Sep 10, 2020
3d29d53
setup pipeline testing
stolarczyk Sep 10, 2020
ba0c6fd
use builtin env var
stolarczyk Sep 10, 2020
a24500d
install r deps
stolarczyk Sep 10, 2020
5109873
set up r
stolarczyk Sep 10, 2020
cadbfa4
install libcurl
stolarczyk Sep 10, 2020
2318146
test plots existence
stolarczyk Sep 10, 2020
303a229
fix format
stolarczyk Sep 10, 2020
3f78eee
test record in postgres
stolarczyk Sep 10, 2020
fe26373
no GDData install, add badge
stolarczyk Sep 10, 2020
5af1967
bring GDData back
stolarczyk Sep 11, 2020
a281f50
update dev reqs
stolarczyk Sep 15, 2020
ba0eb5d
initialize other dict earlier
stolarczyk Oct 2, 2020
7a29ff0
add pre_submit
xuebingjie1990 Nov 2, 2020
f0c23dc
fix piface
nsheff Nov 2, 2020
16211ad
Merge branch 'dev' of github.com:databio/bedstat into dev
nsheff Nov 2, 2020
52c5903
remove duplicated pre_submit section
stolarczyk Nov 3, 2020
c1c7b7d
fix pipeline path specification in the piface
stolarczyk Nov 3, 2020
e5bf50a
fix pipeline path specification in the piface
stolarczyk Nov 3, 2020
dbab12f
cli accept and save relative bed path; #25
stolarczyk Nov 3, 2020
5de0f55
determine and feed the rscript with relative path; #25
stolarczyk Nov 3, 2020
7599c7e
use output getters
stolarczyk Nov 3, 2020
0320ab0
new BedBaseConf updates
stolarczyk Dec 3, 2020
f8edce1
regionstat script fixes
stolarczyk Dec 4, 2020
f20a093
install pipestat for tests
stolarczyk Dec 6, 2020
975df44
test cfg to new format
stolarczyk Dec 6, 2020
9a58245
update to new select syntax
stolarczyk Dec 6, 2020
87654bc
use new bbconf
stolarczyk Dec 6, 2020
3e5f0f9
use new property for test
stolarczyk Dec 7, 2020
7a93fa7
accept all sample.yaml metadata, update to new BedBaseConf class
stolarczyk Dec 12, 2020
38a1797
path is a pdf
stolarczyk Dec 14, 2020
745bc58
bedfile path storage updates
stolarczyk Dec 14, 2020
4c7686a
Update test-pipeline.yml
stolarczyk Dec 15, 2020
f3fbebf
Update test-pipeline.yml
stolarczyk Dec 15, 2020
6fc745c
update cfg
stolarczyk Dec 15, 2020
2884ffc
create bigbed files
xuebingjie1990 Dec 16, 2020
91ed853
report bigbed path, rm bigbed generateion step
xuebingjie1990 Dec 18, 2020
22c59f1
report abs path for bigbed files
xuebingjie1990 Jan 6, 2021
e404e43
bigbed file path
xuebingjie1990 Jan 7, 2021
177cb47
resolve ../ in the file/img path
xuebingjie1990 Jan 18, 2021
bb8ec89
Merge pull request #26 from databio/trackHub
stolarczyk Feb 22, 2021
e808a80
reformat code
stolarczyk Mar 4, 2021
402e86c
add linter
stolarczyk Mar 4, 2021
406e7f0
report refgenie genome digest
xuebingjie1990 Mar 26, 2021
f074e80
report refgenie genome digest
xuebingjie1990 Mar 26, 2021
e84af7b
fix typo, format code
xuebingjie1990 Mar 26, 2021
7f9e5be
update `genome` field
xuebingjie1990 Mar 31, 2021
ce340bf
rm quotes
xuebingjie1990 Mar 31, 2021
06220bd
add file size, refgenie digest
xuebingjie1990 Apr 29, 2021
3576289
rm pandas
xuebingjie1990 Oct 23, 2021
8107b43
just db commit
xuebingjie1990 Oct 24, 2021
ab204e6
add genome without digest if genome not validated with regenie
xuebingjie1990 Dec 6, 2021
f314888
summary statistic for distance to TSS #31
xuebingjie1990 Jan 14, 2022
5f33b8a
rename median_TSS_dist
xuebingjie1990 Jan 14, 2022
8ab7871
fix The unauthenticated git protocol on port 9418 is no longer suppor…
xuebingjie1990 Apr 14, 2022
b40ed1f
update Rdeps, rounding stats https://github.com/databio/bedstat/issue…
xuebingjie1990 May 13, 2022
5a7ff6c
rm R pkg 'conflicted', no need
xuebingjie1990 May 15, 2022
beaa0c4
#32
xuebingjie1990 May 19, 2022
9f455dd
report reord with missing stats
xuebingjie1990 Aug 29, 2022
301798d
report metadata for reord with missing stats
xuebingjie1990 Aug 29, 2022
b0053ce
report metadata for reord with missing stats
xuebingjie1990 Aug 29, 2022
d061fe8
report metadata for reord with missing stats
xuebingjie1990 Aug 29, 2022
5373505
skip GD stats/plot if file missing
xuebingjie1990 Sep 10, 2022
1ae866b
Merge pull request #30 from databio/validate_genome_assembly
xuebingjie1990 Sep 10, 2022
bf33b3f
Update installRdeps.R
nsheff Sep 16, 2022
07f1661
never mind, that was a bad idea.
nsheff Sep 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .github/workflows/black.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: Lint

on: [push, pull_request]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: psf/black@stable
70 changes: 70 additions & 0 deletions .github/workflows/test-pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
name: Test bedstat pipeline

on:
push:
branches: [master, dev]
pull_request:
branches: [master, dev]

jobs:
pytest:
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
os: [ubuntu-latest] # can't use macOS when using service containers or container jobs
r: [release]
runs-on: ${{ matrix.os }}
services:
postgres:
image: postgres
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: bedbasepassword
POSTGRES_DB: postgres
ports:
- 5432:5432
options: --health-cmd pg_isready --health-interval 10s --health-timeout 5s --health-retries 5
steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}

- name: Install dependancies
run: if [ -f requirements/requirements-all.txt ]; then pip install -r requirements/requirements-all.txt; fi

- name: Install dev dependancies
run: if [ -f requirements/requirements-dev.txt ]; then pip install -r requirements/requirements-dev.txt; fi

- name: Install test dependancies
run: if [ -f requirements/requirements-test.txt ]; then pip install -r requirements/requirements-test.txt; fi

- name: Install libcurl
run: |
sudo apt-get update
sudo apt-get install libcurl4-openssl-dev

- uses: r-lib/actions/setup-r@master
with:
r-version: ${{ matrix.r }}

- name: Install R dependancies
run: Rscript scripts/installRdeps.R

- name: Run pipeline
run: looper run -p local tests/data/bedstat_config.yaml

- name: Test plots exist
run: |
if ls $GITHUB_WORKSPACE/outputs/bedstat_output/a6a08126cb6f4b1953ba0ec8675df85a/test_hg38*.png 1> /dev/null 2>&1; then
echo "SUCCESS";
else
echo "ERROR: files do not exist: $GITHUB_WORKSPACE/outputs/bedstat_output/a6a08126cb6f4b1953ba0ec8675df85a/test_hg38*.png";
exit 1
fi

- name: Test record in PostgreSQL
run: |
echo "from bbconf import BedBaseConf; from bbconf.const import *; bbc = BedBaseConf('$GITHUB_WORKSPACE/tests/data/config_min.yaml'); assert bbc.bed.record_count == 1, 'Number of records in the bedfiles table not equal 1'" | python3 -
46 changes: 12 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
![Test bedstat pipeline](https://github.com/databio/bedstat/workflows/Test%20bedstat%20pipeline/badge.svg)
# bedstat
pipeline for obtaining statistics about bed files

Expand Down Expand Up @@ -35,60 +36,37 @@ The input PEP can be validated against the [JSON schema in this repository](pep_
eido validate <path/to/pep> -s https://schema.databio.org/pipelines/bedstat.yaml
```

### 2. Create a persistent volume to house elasticsearch data
### 2. Run PostgreSQL

```
docker volume create es-data
```

### 3. Run the docker container for elasticsearch
For example, to run an instance in a container and make the data persist, execute:

```
docker run -p 9200:9200 -p 9300:9300 -v es-data:/usr/share/elasticsearch/data -e "xpack.ml.enabled=false" \
-e "discovery.type=single-node" elasticsearch:7.5.1
docker volume create postgres-data
docker run -d --name bedbase-postgres -p 5432:5432 -e POSTGRES_PASSWORD=bedbasepassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -v postgres-data:/var/lib/postgresql/data postgres
```
Provided environment variables need to match the settings in bedbase configuration file

### 4. Run the bedstat pipeline on the PEP
### 3. Run the bedstat pipeline on the PEP

Then simply run the looper command to run the pipeline for each bed file. It will create a set of plots and statistics per bed file and insert the metadata into Elastic:
Then simply run the `looper run` command to run the pipeline for each bed file. It will create a set of plots and statistics per bed file and insert the metadata into PostgreSQL:

```
looper run project/bedstat_config.yaml
```

The data loaded into elasticsearch should persist between elasticsearch invocations, on the es-data docker volume created above in step 2.

### 5. (optional) Run Kibana

Kibana can be used in order to see ElasticSearch data in a "GUI" kind of a way.

Pull a matching Kibana docker image. Make sure the Elasticsearch and Kibana container tags match:
```
docker pull docker.elastic.co/kibana/kibana:7.5.1
```

Get the ID of the docker container (started above) running ElasticSearch via
```
docker ps | grep elasticsearch
```

Run Kibana to link to that container:
```
docker run --link <ID OF ELASTIC CONTAINER HERE>:elasticsearch -p 5601:5601 docker.elastic.co/kibana/kibana:7.5.1
```

Point your local web browser to http://localhost:5601

---
The data loaded into PostgreSQL should persist between PostgreSQL invocations, on the `postgres-data` docker volume created above in step 2.

## Additional dependencies

[regionstat.R](tools/regionstat.R) script is used to calculate the bed file statistics, so the pipeline also depends on several R packages:

* `R.utils`
* `BiocManager`
* `optparse`
* `devtools`
* `GenomicRanges`
* `GenomicFeatures`
* `ensembldb`
* `GenomicDistributions`
* `BSgenome.<organim>.UCSC.<genome>` *depending on the genome used*
* `LOLA`
Expand Down
1 change: 0 additions & 1 deletion pep_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ properties:
genome:
type: string
description: "organism genome code"
enum: ["hg18", "hg19", "hg38", "mm9", "mm10"]
narrowpeak:
type: integer
minimum: 0
Expand Down
Loading