This repository hosts bioinformatics scripts designed to support the National Public Health Laboratory (NPHL) of Cameroon in their Mpox genomic surveillance efforts. The codebase focuses on analyzing WGS data for outbreak tracking and characterization, initiated in response to the identification of 5 confirmed cases in the Littoral (4) and South-West (1) regions.
.
├── environment.yml
├── LICENSE
├── phylogeography => Scripts for phylogenetic analysis, molecular dating, and migration plotting.
│ ├── AncestralChanges.py
│ ├── baltic.py
│ ├── compute_float_date.sh
│ ├── final_DataViz.py
│ ├── Plot_migrations.py
│ └── run_phylogenetic_tree.sh
├── README.md
└── viralrecon_MPOX => Wrapper for the nf-core/viralrecon pipeline (assembly & consensus).
├── run_nextclade.sh
└── run_nf_core_viralrecon.shEnsure you have Nextflow and Docker installed
nextflow pull nf-core/viralrecon -r 2.6.0Ensure you have Conda installed.
conda env create -f environment.ymlUse this step to identify circulating strains and build consensus sequences from raw reads.
#Launch the script
bash viralrecon_MPOX/run_nf_core_viralrecon.sh <DATA_DIR> <OUT_DIR> <MODE> [REF_FASTA] [REF_GFF]Parameters:
DATA_DIR: Path to folder containing FASTQ files (required).OUT_DIR: Path where results will be saved (required).MODE:0= Use built-in reference (NC_063383.1).1= Provide custom FASTA/GFF (requires 4th and 5th arguments).- Default=
0.
REF_FASTA: Path to the reference.fastafile.REF_GFF: Path to the reference.gfffile.
NOTE :
Check viralrecon_MPOX/run_nf_core_viralrecon.sh lines 39-40 to ensure R1_EXT and R2_EXT match your .fastq file file extensions. Default values are :
R1_EXT='_R1_001.fastq.gz'
R2_EXT='_R2_001.fastq.gz'Use this step to identify clades and assess sequence quality using Nextclade.
#Ensure environment is active
conda activate phylodynamic
#Run the analysis pipeline
bash viralrecon_MPOX/run_nextclade.sh <Sequences> <Output Directory>Parameters:
- Sequences: Path to the FASTA file with all samples sequences.
- Output Directory: Path to a Output Directory.
This module performs alignment, phylogeny (IQ-TREE), and molecular dating (TreeTime).
#Ensure environment is active
conda activate phylodynamic
#Run the analysis pipeline
bash phylogeography/run_phylogenetic_tree.sh \
<Sequences> \
<Reference Genome> \
<Date File> \
<Locations File> \
<Last Sample Date> \
<Output Directory>Parameters:
- Sequences: Path to the FASTA file with all samples sequences.
- Reference Genome: Path to the reference genome in FASTA format.
- Date File: Path to a CSV/TSV file with columns:
name,date(YYYY-MM-DD). - Location File: Path to a CSV/TSV file with columns:
name,country. - Last Sample Date: The date of the last sample in the tree with the format
YYYY-MM-DD - Output Directory: Path to a Output Directory.
Visualize viral introductions and migration events based on the phylogeographic analysis.
#Ensure environment is active
conda activate phylodynamic
#Run the visualization script
python phylogeography/final_DataViz.py \
--migration <OUTDIR>/mugration/mugration_results.csv \
--pointsGeoloc <path/to/gps_coordinates.csv> --savepdf
#The results is a Dash web page accessible on http://<ip_address>:8050
## Eg : In a local PC http://127.0.0.1:8050/
#Script usage
python phylogeography/final_DataViz.py -hArguments for final_DataViz.py:
--migration: Path tomugration_results.csvgenerated by the phylogeography step.--pointsGeoloc: Path to a CSV with columns:location,long,lat.--origins(Optional): Filter by origin location (e.g.,--origins South Center).--destinations(Optional): Filter by destination (e.g.,--destinations North-West Littoral East).--savepdf(Optional): To save plot in pdf file