Skip to content

Commit 9dd1c63

Browse files
authored
Revise README for clarity and additional details
Updated README to provide a clearer overview of PartiNet, including installation instructions, usage examples, and links to documentation and model weights.
1 parent 1adceec commit 9dd1c63

1 file changed

Lines changed: 51 additions & 291 deletions

File tree

README.md

Lines changed: 51 additions & 291 deletions
Original file line numberDiff line numberDiff line change
@@ -1,326 +1,86 @@
11
# PartiNet
2-
A particle picking tool that uses DynamicDet and trained on CryoPPP
2+
PartiNet is a particle-picking pipeline for cryo-EM micrographs. It provides denoising, adaptive detection, and STAR file generation for downstream processing.
33

4-
## File structure
4+
# Links
5+
- Documentation: https://mihinp.github.io/partinet_documentation/
6+
- Model weights (Hugging Face): https://huggingface.co/MihinP/PartiNet
57

6-
* DynamicDet --> submodule forked from [DynamicDet repo](https://github.com/VDIGPKU/DynamicDet)
7-
* scripts
8-
* meta --> have text files that include:
9-
* `datasets.txt` : preprocess script to choose datasets to denoise and calculate bounding box ()
10-
* `development_set.txt` & `test_set.txt` : preprocess script to split into development and test sets.
11-
* `fix_names_datasets.txt`: contain dataset names that reqiored manual intervention after download.
12-
* `download.sh` : Slurm job to get dataset name then download and untar.
13-
You can also retrieve from RCP a tarred file`raw.tar.gz` with all datasets and untar then using
14-
```
15-
for f in /vast/projects/RCP/PartiNet_data/tarred/*.tar.gz; do tar xvf "$f"; done
16-
```
8+
# Getting started (quick)
9+
1. Clone the repository
1710

18-
* `visualise_denoise_reaults.ipynb` : for checking results visually.
19-
* `preprocess.sh`: Slurm job that runs
20-
* topaz denoising
21-
* `preprocess.py` which calculates bounding box then split images of a dataset into train and val sets if in the development_set, or move to test if in test_set.
22-
* `generate-star-file` --> ??
23-
* `generate-star-file` --> ??
24-
* `detect` --> ??
25-
* `train_step1` and `train_step2` --> ??
26-
27-
## Install
28-
29-
```bash
30-
git clone --recursive git@github.com:WEHI-ResearchComputing/PartiNet.git
11+
```powershell
12+
git clone git@github.com:WEHI-ResearchComputing/PartiNet.git
3113
cd PartiNet
32-
pip install .
3314
```
3415

35-
## Usage
16+
2. Create a Python virtual environment (recommended)
3617

37-
```bash
38-
partinet --help
39-
```
18+
```powershell
19+
python -m venv .venv
20+
.\.venv\Scripts\Activate.ps1
21+
pip install -U pip
4022
```
41-
Usage: partinet [OPTIONS] COMMAND [ARGS]...
4223

43-
Options:
44-
--version Show the version and exit.
45-
--help Show this message and exit.
24+
3. Install requirements
4625

47-
Commands:
48-
detect
49-
preprocess
50-
test
51-
train
26+
```powershell
27+
pip install -r requirements.txt
28+
# or editable install for development
29+
pip install -e .
5230
```
5331

54-
### Preprocessing
55-
56-
This step runs on one dataset to creating bounding boxes and split images into development and test sets.
57-
58-
The Python script uses the conda env installed `/stornext/System/data/apps/rc-tools/rc-tools-1.0/bin/tools/envs/py3_11/bin/python3`
32+
4. Download model weights (see Hugging Face README)
5933

60-
Example run found in `preprocess.py`
61-
62-
```
63-
./preprocess.py --dataset <dataset_name> --datasets_path /vast/scratch/users/iskander.j/PartiNet_data/testing/ --tag _test --bounding_box
34+
```powershell
35+
# If you have git-lfs and access via HTTPS/SSH
36+
git lfs install
37+
git clone https://huggingface.co/MihinP/PartiNet
38+
# or use the huggingface_hub python client
39+
python -m pip install huggingface_hub
40+
python - <<'PY'
41+
from huggingface_hub import hf_hub_download
42+
hf_hub_download(repo_id="MihinP/PartiNet", filename="best.pt", repo_type="model")
43+
PY
6444
```
6545

66-
#### Arguments:
67-
68-
* dataset: dataset name, e.g. 10005, and it must be a directory found in datasets_path
69-
* datasets_path: path to all datasets
70-
* tag: to add a suffix to text files used by script.
71-
* bounding_box: whether to run the calculate bounding box annotation step, default is false and will error if annotation directory is empty.
72-
73-
#### How it works:
74-
First step is go to dataset path and create directory called `annotations` where bounding box data will be saved as `*.txt` files.
75-
76-
Then, using the two text files `development_set<tag>.txt` & `test_set<tag>.txt`, the dataset images and label (annotations) will be split into test and development (train and val) sets and saved to `data_split` directory.
46+
# Quick usage examples
7747

78-
The script writes
79-
* noannot_images.txt: for each dataset, list of micrographs without annotation.
80-
* development_set_split.txt: csv that saves dataset, number of annotated micrographs, number of images in training set, number of images in validation set,number of images in test set.
48+
- Denoise images
8149

82-
**Note: There are a few manual steps required before running any preprocessing (https://github.com/WEHI-ResearchComputing/PartiNet/blob/main/scripts/meta/fix_names_datasets.txt)**
83-
84-
### Training
85-
86-
Training the DynamicDet network is seperated into two steps and therefore two subcommands:
87-
88-
```bash
89-
partinet train --help
50+
```powershell
51+
partinet denoise --source /data/raw_micrographs --project /data/partinet_project
9052
```
91-
```
92-
Usage: partinet train [OPTIONS] COMMAND [ARGS]...
93-
94-
Options:
95-
--help Show this message and exit.
96-
97-
Commands:
98-
step1
99-
step2
100-
```
101-
102-
#### Training Step 1
103-
104-
relevant training args are passed to the `train` subcommand with step1 specific args passed to the
105-
`step1` subsubcommand.
10653

107-
```bash
108-
partinet train step1 --help
109-
```
110-
```
111-
Usage: partinet train step1 [OPTIONS]
54+
- Detect particles
11255

113-
Options:
114-
--backbone-detector [yolov7|yolov7-w6|yolov7x]
115-
The choice of backbone to be used.
116-
[default: yolov7]
117-
--weight TEXT initial weights path [required]
118-
--data TEXT data.yaml path [default: data/coco.yaml]
119-
--hyp [scratch.p5|scratch.p6|finetune.dynamic.adam]
120-
hyperparameters path [default: scratch.p5]
121-
--epochs INTEGER [default: 300]
122-
--batch-size INTEGER total batch size for all GPUs [default: 16]
123-
--img-size INTEGER... [train, test] image sizes [default: 640,
124-
640]
125-
--rect rectangular training
126-
--resume resume most recent training
127-
--resume-ckpt TEXT checkpoint to resume from
128-
--nosave only save final checkpoint
129-
--notest only test final epoch
130-
--noautoanchor disable autoanchor check
131-
--bucket TEXT gsutil bucket
132-
--cache-images cache images for faster training
133-
--image-weights use weighted image selection for training
134-
--device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu
135-
--multi-scale vary img-size +/- 50%%
136-
--single-cls train multi-class data as single-class
137-
--adam use torch.optim.Adam() optimizer
138-
--sync-bn use SyncBatchNorm, only available in DDP
139-
mode
140-
--local_rank INTEGER DDP parameter, do not modify [default: -1]
141-
--workers INTEGER maximum number of dataloader workers
142-
[default: 8]
143-
--project TEXT save to project/name [default: runs/train]
144-
--entity TEXT W&B entity
145-
--name TEXT save to project/name [default: exp]
146-
--exist-ok existing project/name ok, do not increment
147-
--quad quad dataloader
148-
--label-smoothing FLOAT Label smoothing epsilon [default: 0.0]
149-
--upload_dataset Upload dataset as W&B artifact table
150-
--bbox_interval INTEGER Set bounding-box image logging interval for
151-
W&B [default: -1]
152-
--save_period INTEGER Log model after every "save_period" epoch
153-
[default: -1]
154-
--artifact_alias TEXT version of dataset artifact to be used
155-
[default: latest]
156-
--freeze INTEGER Freeze layers: backbone of yolov7=50,
157-
first3=0 1 2 [default: 0]
158-
--v5-metric assume maximum recall as 1.0 in AP
159-
calculation
160-
--single-backbone train single backbone model
161-
--linear-lr linear LR
162-
--help Show this message and exit.
56+
```powershell
57+
partinet detect --weight /path/to/best.pt --source /data/partinet_project/denoised --project /data/partinet_project
16358
```
16459

165-
TODO: more details...
166-
Example
167-
```
168-
partinet train step1 --cfg partinet/DynamicDet/cfg/dy-yolov7-step1.yaml --weight '' --data path/to/cryo_training.yaml --hyp partinet/DynamicDet/hyp/hyp.scratch.p5.yaml --name train_step1 --save_period 10 --epochs 20 --batch-size 16 --img-size 640 640 --workers 16 --device 0,1,2,3 --sync-bn
60+
- Generate STAR files
16961

62+
```powershell
63+
partinet star --project /data/partinet_project --output /data/partinet_project/exp/particles.star
17064
```
171-
#### Training Step 2
17265

173-
Like step1, training args are passed to `train`, but no special arguments are passed to the `step2`
174-
subsubcommand.
66+
# Containerized usage
17567

176-
```output
177-
Usage: partinet train step2 [OPTIONS]
68+
- Docker
17869

179-
Options:
180-
--backbone-detector [yolov7|yolov7-w6|yolov7x]
181-
The choice of backbone to be used.
182-
[default: yolov7]
183-
--weight TEXT initial weights path [required]
184-
--data TEXT data.yaml path [default: data/coco.yaml]
185-
--hyp [scratch.p5|scratch.p6|finetune.dynamic.adam]
186-
hyperparameters path [default: scratch.p5]
187-
--epochs INTEGER [default: 300]
188-
--batch-size INTEGER total batch size for all GPUs [default: 16]
189-
--img-size INTEGER... [train, test] image sizes [default: 640,
190-
640]
191-
--rect rectangular training
192-
--resume resume most recent training
193-
--resume-ckpt TEXT checkpoint to resume from
194-
--nosave only save final checkpoint
195-
--notest only test final epoch
196-
--noautoanchor disable autoanchor check
197-
--bucket TEXT gsutil bucket
198-
--cache-images cache images for faster training
199-
--image-weights use weighted image selection for training
200-
--device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu
201-
--multi-scale vary img-size +/- 50%%
202-
--single-cls train multi-class data as single-class
203-
--adam use torch.optim.Adam() optimizer
204-
--sync-bn use SyncBatchNorm, only available in DDP
205-
mode
206-
--local_rank INTEGER DDP parameter, do not modify [default: -1]
207-
--workers INTEGER maximum number of dataloader workers
208-
[default: 8]
209-
--project TEXT save to project/name [default: runs/train]
210-
--entity TEXT W&B entity
211-
--name TEXT save to project/name [default: exp]
212-
--exist-ok existing project/name ok, do not increment
213-
--quad quad dataloader
214-
--label-smoothing FLOAT Label smoothing epsilon [default: 0.0]
215-
--upload_dataset Upload dataset as W&B artifact table
216-
--bbox_interval INTEGER Set bounding-box image logging interval for
217-
W&B [default: -1]
218-
--save_period INTEGER Log model after every "save_period" epoch
219-
[default: -1]
220-
--artifact_alias TEXT version of dataset artifact to be used
221-
[default: latest]
222-
--freeze INTEGER Freeze layers: backbone of yolov7=50,
223-
first3=0 1 2 [default: 0]
224-
--v5-metric assume maximum recall as 1.0 in AP
225-
calculation
226-
--help Show this message and exit.
70+
```powershell
71+
docker run --gpus all -v /data:/data ghcr.io/wehi-researchcomputing/partinet:main partinet detect --weight /path/to/best.pt --source /data/denoised --project /data/partinet_project
22772
```
22873

229-
TODO: more details...
230-
Example
231-
```bash
232-
partinet train step2 --backbone-detector yolov7 --weight /path/to/runs/train/train-step1-300epochs/weights/last.pt --workers 4 --device 0 --batch-size 1 --epochs 10 --img-size 640 640 --adam --data /path/to/cryo_training_all.yaml --hyp finetune.dynamic.adam --name train_step2
233-
```
234-
## Detection
74+
- Apptainer / Singularity
23575

236-
```bash
237-
partinet detect --help
238-
```
239-
```
240-
Options:
241-
--backbone-detector [yolov7|yolov7-w6|yolov7x]
242-
The choice of backbone to be used.
243-
[default: yolov7]
244-
--weight TEXT model.pt path(s) [required]
245-
--source TEXT source [default: inference/images]
246-
--num-classes INTEGER number of classes [default: 80]
247-
--img-size INTEGER inference size (pixels) [default: 640]
248-
--conf-thres FLOAT object confidence threshold [default: 0.25]
249-
--iou-thres FLOAT IOU threshold for NMS [default: 0.45]
250-
--device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu
251-
--view-img display results
252-
--save-txt save results to *.txt
253-
--save-conf save confidences in --save-txt labels
254-
--nosave do not save images/videos
255-
--classes INTEGER filter by class: --classes 0, or --classes 0
256-
--classes 2 --classes 3
257-
--agnostic-nms class-agnostic NMS
258-
--augment augmented inference
259-
--project TEXT save results to project/name [default:
260-
runs/detect]
261-
--name TEXT save results to project/name [default: exp]
262-
--exist-ok existing project/name ok, do not increment
263-
--dy-thres FLOAT dynamic thres [default: 0.5]
264-
--help Show this message and exit.
76+
```powershell
77+
apptainer exec --nv --no-home -B /data oras://ghcr.io/wehi-researchcomputing/partinet:main-singularity partinet detect --weight /path/to/best.pt --source /data/denoised --project /data/partinet_project
26578
```
26679

267-
TODO: more details...
80+
# Development notes
81+
- Tests and CI: see `.github/workflows/` for CI pipelines.
82+
- Contributing: open issues and PRs on the main repo. Use the documentation site for user-facing docs and developer notes.
26883

269-
## Testing
27084

271-
```bash
272-
partinet test --help
273-
```
274-
```
275-
Usage: partinet test [OPTIONS]
276-
277-
Options:
278-
--backbone-detector [yolov7|yolov7-w6|yolov7x]
279-
The choice of backbone to be used.
280-
[default: yolov7]
281-
--weight TEXT model.pt path(s) [required]
282-
--data TEXT data.yaml path [default: data/coco.yaml]
283-
--batch-size INTEGER total batch size for all GPUs [default: 1]
284-
--img-size INTEGER validation image size (pixels) [default:
285-
640]
286-
--conf-thres FLOAT object confidence threshold [default:
287-
0.001]
288-
--iou-thres FLOAT IOU threshold for NMS [default: 0.65]
289-
--task [train|val|test] train, val, test, speed or study [default:
290-
test]
291-
--device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu
292-
--single-cls train multi-class data as single-class
293-
--augment augmented inference
294-
--verbose report mAP by class
295-
--save-txt save results to *.txt
296-
--save-hybrid save label+prediction hybrid results to
297-
*.txt
298-
--save-conf save confidences in --save-txt labels
299-
--save-json save a cocoapi-compatible JSON results file
300-
--project TEXT save to project/name [default: runs/test]
301-
--name TEXT save to project/name [default: exp]
302-
--exist-ok existing project/name ok, do not increment
303-
--v5-metric assume maximum recall as 1.0 in AP
304-
calculation
305-
--dy-thres FLOAT dynamic thres [default: 0.5]
306-
--save-results save results
307-
--help Show this message and exit.
308-
```
309-
310-
TODO: more details...
311-
312-
## Get the container
313-
314-
Replace `partinet <subcommand> <args>` with one of the below:
315-
316-
### Docker
317-
318-
```bash
319-
docker run ghcr.io/wehi-researchcomputing/partinet:latest <subcommand> <args>
320-
```
321-
322-
### Singularity/Apptainer
323-
324-
```bash
325-
singularity run oras://ghcr.io/wehi-researchcomputing/partinet:latest <subcommand> <args>
326-
```
85+
# Support
86+
- For questions or issues, open an issue in the main repo: https://github.com/WEHI-ResearchComputing/PartiNet/issues

0 commit comments

Comments
 (0)