|
1 | 1 | # PartiNet |
2 | | -A particle picking tool that uses DynamicDet and trained on CryoPPP |
| 2 | +PartiNet is a particle-picking pipeline for cryo-EM micrographs. It provides denoising, adaptive detection, and STAR file generation for downstream processing. |
3 | 3 |
|
4 | | -## File structure |
| 4 | +# Links |
| 5 | +- Documentation: https://mihinp.github.io/partinet_documentation/ |
| 6 | +- Model weights (Hugging Face): https://huggingface.co/MihinP/PartiNet |
5 | 7 |
|
6 | | -* DynamicDet --> submodule forked from [DynamicDet repo](https://github.com/VDIGPKU/DynamicDet) |
7 | | -* scripts |
8 | | - * meta --> have text files that include: |
9 | | - * `datasets.txt` : preprocess script to choose datasets to denoise and calculate bounding box () |
10 | | - * `development_set.txt` & `test_set.txt` : preprocess script to split into development and test sets. |
11 | | - * `fix_names_datasets.txt`: contain dataset names that reqiored manual intervention after download. |
12 | | - * `download.sh` : Slurm job to get dataset name then download and untar. |
13 | | - You can also retrieve from RCP a tarred file`raw.tar.gz` with all datasets and untar then using |
14 | | - ``` |
15 | | - for f in /vast/projects/RCP/PartiNet_data/tarred/*.tar.gz; do tar xvf "$f"; done |
16 | | - ``` |
| 8 | +# Getting started (quick) |
| 9 | +1. Clone the repository |
17 | 10 |
|
18 | | - * `visualise_denoise_reaults.ipynb` : for checking results visually. |
19 | | - * `preprocess.sh`: Slurm job that runs |
20 | | - * topaz denoising |
21 | | - * `preprocess.py` which calculates bounding box then split images of a dataset into train and val sets if in the development_set, or move to test if in test_set. |
22 | | - * `generate-star-file` --> ?? |
23 | | - * `generate-star-file` --> ?? |
24 | | - * `detect` --> ?? |
25 | | - * `train_step1` and `train_step2` --> ?? |
26 | | - |
27 | | -## Install |
28 | | - |
29 | | -```bash |
30 | | -git clone --recursive git@github.com:WEHI-ResearchComputing/PartiNet.git |
| 11 | +```powershell |
| 12 | +git clone git@github.com:WEHI-ResearchComputing/PartiNet.git |
31 | 13 | cd PartiNet |
32 | | -pip install . |
33 | 14 | ``` |
34 | 15 |
|
35 | | -## Usage |
| 16 | +2. Create a Python virtual environment (recommended) |
36 | 17 |
|
37 | | -```bash |
38 | | -partinet --help |
39 | | -``` |
| 18 | +```powershell |
| 19 | +python -m venv .venv |
| 20 | +.\.venv\Scripts\Activate.ps1 |
| 21 | +pip install -U pip |
40 | 22 | ``` |
41 | | -Usage: partinet [OPTIONS] COMMAND [ARGS]... |
42 | 23 |
|
43 | | -Options: |
44 | | - --version Show the version and exit. |
45 | | - --help Show this message and exit. |
| 24 | +3. Install requirements |
46 | 25 |
|
47 | | -Commands: |
48 | | - detect |
49 | | - preprocess |
50 | | - test |
51 | | - train |
| 26 | +```powershell |
| 27 | +pip install -r requirements.txt |
| 28 | +# or editable install for development |
| 29 | +pip install -e . |
52 | 30 | ``` |
53 | 31 |
|
54 | | -### Preprocessing |
55 | | - |
56 | | -This step runs on one dataset to creating bounding boxes and split images into development and test sets. |
57 | | - |
58 | | -The Python script uses the conda env installed `/stornext/System/data/apps/rc-tools/rc-tools-1.0/bin/tools/envs/py3_11/bin/python3` |
| 32 | +4. Download model weights (see Hugging Face README) |
59 | 33 |
|
60 | | -Example run found in `preprocess.py` |
61 | | - |
62 | | -``` |
63 | | -./preprocess.py --dataset <dataset_name> --datasets_path /vast/scratch/users/iskander.j/PartiNet_data/testing/ --tag _test --bounding_box |
| 34 | +```powershell |
| 35 | +# If you have git-lfs and access via HTTPS/SSH |
| 36 | +git lfs install |
| 37 | +git clone https://huggingface.co/MihinP/PartiNet |
| 38 | +# or use the huggingface_hub python client |
| 39 | +python -m pip install huggingface_hub |
| 40 | +python - <<'PY' |
| 41 | +from huggingface_hub import hf_hub_download |
| 42 | +hf_hub_download(repo_id="MihinP/PartiNet", filename="best.pt", repo_type="model") |
| 43 | +PY |
64 | 44 | ``` |
65 | 45 |
|
66 | | -#### Arguments: |
67 | | - |
68 | | -* dataset: dataset name, e.g. 10005, and it must be a directory found in datasets_path |
69 | | -* datasets_path: path to all datasets |
70 | | -* tag: to add a suffix to text files used by script. |
71 | | -* bounding_box: whether to run the calculate bounding box annotation step, default is false and will error if annotation directory is empty. |
72 | | - |
73 | | -#### How it works: |
74 | | -First step is go to dataset path and create directory called `annotations` where bounding box data will be saved as `*.txt` files. |
75 | | - |
76 | | -Then, using the two text files `development_set<tag>.txt` & `test_set<tag>.txt`, the dataset images and label (annotations) will be split into test and development (train and val) sets and saved to `data_split` directory. |
| 46 | +# Quick usage examples |
77 | 47 |
|
78 | | -The script writes |
79 | | -* noannot_images.txt: for each dataset, list of micrographs without annotation. |
80 | | -* development_set_split.txt: csv that saves dataset, number of annotated micrographs, number of images in training set, number of images in validation set,number of images in test set. |
| 48 | +- Denoise images |
81 | 49 |
|
82 | | -**Note: There are a few manual steps required before running any preprocessing (https://github.com/WEHI-ResearchComputing/PartiNet/blob/main/scripts/meta/fix_names_datasets.txt)** |
83 | | - |
84 | | -### Training |
85 | | - |
86 | | -Training the DynamicDet network is seperated into two steps and therefore two subcommands: |
87 | | - |
88 | | -```bash |
89 | | -partinet train --help |
| 50 | +```powershell |
| 51 | +partinet denoise --source /data/raw_micrographs --project /data/partinet_project |
90 | 52 | ``` |
91 | | -``` |
92 | | -Usage: partinet train [OPTIONS] COMMAND [ARGS]... |
93 | | -
|
94 | | -Options: |
95 | | - --help Show this message and exit. |
96 | | -
|
97 | | -Commands: |
98 | | - step1 |
99 | | - step2 |
100 | | -``` |
101 | | - |
102 | | -#### Training Step 1 |
103 | | - |
104 | | -relevant training args are passed to the `train` subcommand with step1 specific args passed to the |
105 | | -`step1` subsubcommand. |
106 | 53 |
|
107 | | -```bash |
108 | | -partinet train step1 --help |
109 | | -``` |
110 | | -``` |
111 | | -Usage: partinet train step1 [OPTIONS] |
| 54 | +- Detect particles |
112 | 55 |
|
113 | | -Options: |
114 | | - --backbone-detector [yolov7|yolov7-w6|yolov7x] |
115 | | - The choice of backbone to be used. |
116 | | - [default: yolov7] |
117 | | - --weight TEXT initial weights path [required] |
118 | | - --data TEXT data.yaml path [default: data/coco.yaml] |
119 | | - --hyp [scratch.p5|scratch.p6|finetune.dynamic.adam] |
120 | | - hyperparameters path [default: scratch.p5] |
121 | | - --epochs INTEGER [default: 300] |
122 | | - --batch-size INTEGER total batch size for all GPUs [default: 16] |
123 | | - --img-size INTEGER... [train, test] image sizes [default: 640, |
124 | | - 640] |
125 | | - --rect rectangular training |
126 | | - --resume resume most recent training |
127 | | - --resume-ckpt TEXT checkpoint to resume from |
128 | | - --nosave only save final checkpoint |
129 | | - --notest only test final epoch |
130 | | - --noautoanchor disable autoanchor check |
131 | | - --bucket TEXT gsutil bucket |
132 | | - --cache-images cache images for faster training |
133 | | - --image-weights use weighted image selection for training |
134 | | - --device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu |
135 | | - --multi-scale vary img-size +/- 50%% |
136 | | - --single-cls train multi-class data as single-class |
137 | | - --adam use torch.optim.Adam() optimizer |
138 | | - --sync-bn use SyncBatchNorm, only available in DDP |
139 | | - mode |
140 | | - --local_rank INTEGER DDP parameter, do not modify [default: -1] |
141 | | - --workers INTEGER maximum number of dataloader workers |
142 | | - [default: 8] |
143 | | - --project TEXT save to project/name [default: runs/train] |
144 | | - --entity TEXT W&B entity |
145 | | - --name TEXT save to project/name [default: exp] |
146 | | - --exist-ok existing project/name ok, do not increment |
147 | | - --quad quad dataloader |
148 | | - --label-smoothing FLOAT Label smoothing epsilon [default: 0.0] |
149 | | - --upload_dataset Upload dataset as W&B artifact table |
150 | | - --bbox_interval INTEGER Set bounding-box image logging interval for |
151 | | - W&B [default: -1] |
152 | | - --save_period INTEGER Log model after every "save_period" epoch |
153 | | - [default: -1] |
154 | | - --artifact_alias TEXT version of dataset artifact to be used |
155 | | - [default: latest] |
156 | | - --freeze INTEGER Freeze layers: backbone of yolov7=50, |
157 | | - first3=0 1 2 [default: 0] |
158 | | - --v5-metric assume maximum recall as 1.0 in AP |
159 | | - calculation |
160 | | - --single-backbone train single backbone model |
161 | | - --linear-lr linear LR |
162 | | - --help Show this message and exit. |
| 56 | +```powershell |
| 57 | +partinet detect --weight /path/to/best.pt --source /data/partinet_project/denoised --project /data/partinet_project |
163 | 58 | ``` |
164 | 59 |
|
165 | | -TODO: more details... |
166 | | -Example |
167 | | -``` |
168 | | -partinet train step1 --cfg partinet/DynamicDet/cfg/dy-yolov7-step1.yaml --weight '' --data path/to/cryo_training.yaml --hyp partinet/DynamicDet/hyp/hyp.scratch.p5.yaml --name train_step1 --save_period 10 --epochs 20 --batch-size 16 --img-size 640 640 --workers 16 --device 0,1,2,3 --sync-bn |
| 60 | +- Generate STAR files |
169 | 61 |
|
| 62 | +```powershell |
| 63 | +partinet star --project /data/partinet_project --output /data/partinet_project/exp/particles.star |
170 | 64 | ``` |
171 | | -#### Training Step 2 |
172 | 65 |
|
173 | | -Like step1, training args are passed to `train`, but no special arguments are passed to the `step2` |
174 | | -subsubcommand. |
| 66 | +# Containerized usage |
175 | 67 |
|
176 | | -```output |
177 | | -Usage: partinet train step2 [OPTIONS] |
| 68 | +- Docker |
178 | 69 |
|
179 | | -Options: |
180 | | - --backbone-detector [yolov7|yolov7-w6|yolov7x] |
181 | | - The choice of backbone to be used. |
182 | | - [default: yolov7] |
183 | | - --weight TEXT initial weights path [required] |
184 | | - --data TEXT data.yaml path [default: data/coco.yaml] |
185 | | - --hyp [scratch.p5|scratch.p6|finetune.dynamic.adam] |
186 | | - hyperparameters path [default: scratch.p5] |
187 | | - --epochs INTEGER [default: 300] |
188 | | - --batch-size INTEGER total batch size for all GPUs [default: 16] |
189 | | - --img-size INTEGER... [train, test] image sizes [default: 640, |
190 | | - 640] |
191 | | - --rect rectangular training |
192 | | - --resume resume most recent training |
193 | | - --resume-ckpt TEXT checkpoint to resume from |
194 | | - --nosave only save final checkpoint |
195 | | - --notest only test final epoch |
196 | | - --noautoanchor disable autoanchor check |
197 | | - --bucket TEXT gsutil bucket |
198 | | - --cache-images cache images for faster training |
199 | | - --image-weights use weighted image selection for training |
200 | | - --device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu |
201 | | - --multi-scale vary img-size +/- 50%% |
202 | | - --single-cls train multi-class data as single-class |
203 | | - --adam use torch.optim.Adam() optimizer |
204 | | - --sync-bn use SyncBatchNorm, only available in DDP |
205 | | - mode |
206 | | - --local_rank INTEGER DDP parameter, do not modify [default: -1] |
207 | | - --workers INTEGER maximum number of dataloader workers |
208 | | - [default: 8] |
209 | | - --project TEXT save to project/name [default: runs/train] |
210 | | - --entity TEXT W&B entity |
211 | | - --name TEXT save to project/name [default: exp] |
212 | | - --exist-ok existing project/name ok, do not increment |
213 | | - --quad quad dataloader |
214 | | - --label-smoothing FLOAT Label smoothing epsilon [default: 0.0] |
215 | | - --upload_dataset Upload dataset as W&B artifact table |
216 | | - --bbox_interval INTEGER Set bounding-box image logging interval for |
217 | | - W&B [default: -1] |
218 | | - --save_period INTEGER Log model after every "save_period" epoch |
219 | | - [default: -1] |
220 | | - --artifact_alias TEXT version of dataset artifact to be used |
221 | | - [default: latest] |
222 | | - --freeze INTEGER Freeze layers: backbone of yolov7=50, |
223 | | - first3=0 1 2 [default: 0] |
224 | | - --v5-metric assume maximum recall as 1.0 in AP |
225 | | - calculation |
226 | | - --help Show this message and exit. |
| 70 | +```powershell |
| 71 | +docker run --gpus all -v /data:/data ghcr.io/wehi-researchcomputing/partinet:main partinet detect --weight /path/to/best.pt --source /data/denoised --project /data/partinet_project |
227 | 72 | ``` |
228 | 73 |
|
229 | | -TODO: more details... |
230 | | -Example |
231 | | -```bash |
232 | | -partinet train step2 --backbone-detector yolov7 --weight /path/to/runs/train/train-step1-300epochs/weights/last.pt --workers 4 --device 0 --batch-size 1 --epochs 10 --img-size 640 640 --adam --data /path/to/cryo_training_all.yaml --hyp finetune.dynamic.adam --name train_step2 |
233 | | -``` |
234 | | -## Detection |
| 74 | +- Apptainer / Singularity |
235 | 75 |
|
236 | | -```bash |
237 | | -partinet detect --help |
238 | | -``` |
239 | | -``` |
240 | | -Options: |
241 | | - --backbone-detector [yolov7|yolov7-w6|yolov7x] |
242 | | - The choice of backbone to be used. |
243 | | - [default: yolov7] |
244 | | - --weight TEXT model.pt path(s) [required] |
245 | | - --source TEXT source [default: inference/images] |
246 | | - --num-classes INTEGER number of classes [default: 80] |
247 | | - --img-size INTEGER inference size (pixels) [default: 640] |
248 | | - --conf-thres FLOAT object confidence threshold [default: 0.25] |
249 | | - --iou-thres FLOAT IOU threshold for NMS [default: 0.45] |
250 | | - --device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu |
251 | | - --view-img display results |
252 | | - --save-txt save results to *.txt |
253 | | - --save-conf save confidences in --save-txt labels |
254 | | - --nosave do not save images/videos |
255 | | - --classes INTEGER filter by class: --classes 0, or --classes 0 |
256 | | - --classes 2 --classes 3 |
257 | | - --agnostic-nms class-agnostic NMS |
258 | | - --augment augmented inference |
259 | | - --project TEXT save results to project/name [default: |
260 | | - runs/detect] |
261 | | - --name TEXT save results to project/name [default: exp] |
262 | | - --exist-ok existing project/name ok, do not increment |
263 | | - --dy-thres FLOAT dynamic thres [default: 0.5] |
264 | | - --help Show this message and exit. |
| 76 | +```powershell |
| 77 | +apptainer exec --nv --no-home -B /data oras://ghcr.io/wehi-researchcomputing/partinet:main-singularity partinet detect --weight /path/to/best.pt --source /data/denoised --project /data/partinet_project |
265 | 78 | ``` |
266 | 79 |
|
267 | | -TODO: more details... |
| 80 | +# Development notes |
| 81 | +- Tests and CI: see `.github/workflows/` for CI pipelines. |
| 82 | +- Contributing: open issues and PRs on the main repo. Use the documentation site for user-facing docs and developer notes. |
268 | 83 |
|
269 | | -## Testing |
270 | 84 |
|
271 | | -```bash |
272 | | -partinet test --help |
273 | | -``` |
274 | | -``` |
275 | | -Usage: partinet test [OPTIONS] |
276 | | -
|
277 | | -Options: |
278 | | - --backbone-detector [yolov7|yolov7-w6|yolov7x] |
279 | | - The choice of backbone to be used. |
280 | | - [default: yolov7] |
281 | | - --weight TEXT model.pt path(s) [required] |
282 | | - --data TEXT data.yaml path [default: data/coco.yaml] |
283 | | - --batch-size INTEGER total batch size for all GPUs [default: 1] |
284 | | - --img-size INTEGER validation image size (pixels) [default: |
285 | | - 640] |
286 | | - --conf-thres FLOAT object confidence threshold [default: |
287 | | - 0.001] |
288 | | - --iou-thres FLOAT IOU threshold for NMS [default: 0.65] |
289 | | - --task [train|val|test] train, val, test, speed or study [default: |
290 | | - test] |
291 | | - --device TEXT cuda device, i.e. 0 or 0,1,2,3 or cpu |
292 | | - --single-cls train multi-class data as single-class |
293 | | - --augment augmented inference |
294 | | - --verbose report mAP by class |
295 | | - --save-txt save results to *.txt |
296 | | - --save-hybrid save label+prediction hybrid results to |
297 | | - *.txt |
298 | | - --save-conf save confidences in --save-txt labels |
299 | | - --save-json save a cocoapi-compatible JSON results file |
300 | | - --project TEXT save to project/name [default: runs/test] |
301 | | - --name TEXT save to project/name [default: exp] |
302 | | - --exist-ok existing project/name ok, do not increment |
303 | | - --v5-metric assume maximum recall as 1.0 in AP |
304 | | - calculation |
305 | | - --dy-thres FLOAT dynamic thres [default: 0.5] |
306 | | - --save-results save results |
307 | | - --help Show this message and exit. |
308 | | -``` |
309 | | - |
310 | | -TODO: more details... |
311 | | - |
312 | | -## Get the container |
313 | | - |
314 | | -Replace `partinet <subcommand> <args>` with one of the below: |
315 | | - |
316 | | -### Docker |
317 | | - |
318 | | -```bash |
319 | | -docker run ghcr.io/wehi-researchcomputing/partinet:latest <subcommand> <args> |
320 | | -``` |
321 | | - |
322 | | -### Singularity/Apptainer |
323 | | - |
324 | | -```bash |
325 | | -singularity run oras://ghcr.io/wehi-researchcomputing/partinet:latest <subcommand> <args> |
326 | | -``` |
| 85 | +# Support |
| 86 | +- For questions or issues, open an issue in the main repo: https://github.com/WEHI-ResearchComputing/PartiNet/issues |
0 commit comments