Skip to content

MOISECHRIST/NPHL_Mpox_WGS_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NPHL Mpox WGS Analysis

This repository hosts bioinformatics scripts designed to support the National Public Health Laboratory (NPHL) of Cameroon in their Mpox genomic surveillance efforts. The codebase focuses on analyzing WGS data for outbreak tracking and characterization, initiated in response to the identification of 5 confirmed cases in the Littoral (4) and South-West (1) regions.

Project Structure

.
├── environment.yml
├── LICENSE
├── phylogeography => Scripts for phylogenetic analysis, molecular dating, and migration plotting.
│   ├── AncestralChanges.py
│   ├── baltic.py
│   ├── compute_float_date.sh
│   ├── final_DataViz.py
│   ├── Plot_migrations.py
│   └── run_phylogenetic_tree.sh
├── README.md
└── viralrecon_MPOX => Wrapper for the nf-core/viralrecon pipeline (assembly & consensus).
    ├── run_nextclade.sh
    └── run_nf_core_viralrecon.sh

Installation & Requirements

1. nf-core/viralrecon

Ensure you have Nextflow and Docker installed

nextflow pull nf-core/viralrecon -r 2.6.0

2. Clade assignment, quality checks and phylogeography

Ensure you have Conda installed.

conda env create -f environment.yml

Usage

1. Launch nf-core/Viralrecon

Use this step to identify circulating strains and build consensus sequences from raw reads.

#Launch the script 
bash viralrecon_MPOX/run_nf_core_viralrecon.sh <DATA_DIR> <OUT_DIR> <MODE> [REF_FASTA] [REF_GFF]

Parameters:

  • DATA_DIR: Path to folder containing FASTQ files (required).
  • OUT_DIR: Path where results will be saved (required).
  • MODE:
    • 0 = Use built-in reference (NC_063383.1).
    • 1 = Provide custom FASTA/GFF (requires 4th and 5th arguments).
    • Default=0.
  • REF_FASTA : Path to the reference .fasta file.
  • REF_GFF : Path to the reference .gff file.

NOTE :
Check viralrecon_MPOX/run_nf_core_viralrecon.sh lines 39-40 to ensure R1_EXT and R2_EXT match your .fastq file file extensions. Default values are :

R1_EXT='_R1_001.fastq.gz'
R2_EXT='_R2_001.fastq.gz'

2. Clade Assignment and Quality Checks

Use this step to identify clades and assess sequence quality using Nextclade.

#Ensure environment is active 
conda activate phylodynamic

#Run the analysis pipeline
bash viralrecon_MPOX/run_nextclade.sh <Sequences> <Output Directory>

Parameters:

  • Sequences: Path to the FASTA file with all samples sequences.
  • Output Directory: Path to a Output Directory.

3. Phylogeography Analysis

This module performs alignment, phylogeny (IQ-TREE), and molecular dating (TreeTime).

#Ensure environment is active 
conda activate phylodynamic

#Run the analysis pipeline
bash phylogeography/run_phylogenetic_tree.sh \
      <Sequences> \
      <Reference Genome> \
      <Date File> \
      <Locations File> \
      <Last Sample Date> \
      <Output Directory>

Parameters:

  • Sequences: Path to the FASTA file with all samples sequences.
  • Reference Genome: Path to the reference genome in FASTA format.
  • Date File: Path to a CSV/TSV file with columns: name, date (YYYY-MM-DD).
  • Location File: Path to a CSV/TSV file with columns: name, country.
  • Last Sample Date: The date of the last sample in the tree with the format YYYY-MM-DD
  • Output Directory: Path to a Output Directory.

4. Visualization (Migration Plots)

Visualize viral introductions and migration events based on the phylogeographic analysis.

#Ensure environment is active
conda activate phylodynamic

#Run the visualization script
python phylogeography/final_DataViz.py \
      --migration <OUTDIR>/mugration/mugration_results.csv \
      --pointsGeoloc <path/to/gps_coordinates.csv> --savepdf

#The results is a Dash web page accessible on http://<ip_address>:8050
## Eg : In a local PC http://127.0.0.1:8050/

#Script usage
python phylogeography/final_DataViz.py -h

Arguments for final_DataViz.py:

  • --migration: Path to mugration_results.csv generated by the phylogeography step.
  • --pointsGeoloc: Path to a CSV with columns: location, long, lat.
  • --origins (Optional): Filter by origin location (e.g., --origins South Center).
  • --destinations (Optional): Filter by destination (e.g., --destinations North-West Littoral East).
  • --savepdf (Optional): To save plot in pdf file

About

This repository contains scripts to assist the Cameroon NPHL Bioinformatic Team in analyzing Mpox WGS data for outbreak tracking and characterization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors