Skip to content

Conversation

@nicolasDelhomme
Copy link

…the script

@nicolasDelhomme
Copy link
Author

The second commit deals with parallelisation vs. concurrency. In python, the threading library enables parallel processing on one core, meaning that all tasks will run on the same CPU. This is ideal if none of these tasks needs the whole CPU. Concurrency is when tasks can run in parallel independently, possibly on multiple cores. This is implemented in library such as multiprocessing or multiprocess (the latter is a library that addresses issues of the former on the MAC OS). I have updated the blast parser to use multiprocess instead of threading. I have very large transcript datasets to annotate and the threading version (using 10 threads by default on one core) took weeks to finish, while the multiprocess one on 10 cores takes a mere hour. I have also added an argument to the parse script so that the number of cores to use can be controlled from build.py or prediction.py. Finally, I cleared some white trailing spaces.

… 2) a force flag to ignore the behaviour just described, 3) passing the number of CPUs to the parser file, 4) added checks to move files only if they had been generated and 5) removed trailing spaces
@nicolasDelhomme
Copy link
Author

nicolasDelhomme commented Jan 19, 2022

The third commit

  1. added checkpoints to avoid reprocessing data if already completed (as I wrote I have very large datasets and hence long processing time for the prediction),
  2. added a force flag to ignore the behavior just described,
  3. is passing the number of CPUs to the blast parser file,
  4. added checks to move files only if they had been generated and
  5. removed trailing spaces

@urmi-21
Copy link
Owner

urmi-21 commented Feb 27, 2022

Thank you for your contributions, @nicolasDelhomme . I will review this soon.

@urmi-21 urmi-21 self-requested a review February 27, 2022 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants