ddi-ps3n

PS3N: Leveraging Protein Sequence-Structure Similarity for Novel Drug-Drug Interaction Discovery

Project Overview

This repository contains the code and resources for our study on predicting drug-drug interactions (DDIs) using the PS3N model. Our approach leverages protein sequence and structure information to build a similarity network for identifying novel DDIs. Due to space constraints, data files are not included in the repository but will be provided upon request or in the reviewers' comment section.

The project is organized into the following directories:

Repository Structure

1. `Input Parsing`

This folder contains scripts for parsing and extracting relevant information from the DrugBank dataset.

parse-drugbank.py: Reads the DrugBank XML file and extracts information such as:
- Protein pathways
- Drug active ingredient information
- Drug-protein targets
- Drug interactions
- Drug ATC classification and pathways

Ensure the DrugBank XML data is available and adjust the file paths accordingly before running the script.

2. `Similarity Metrics Calculation`

This folder contains scripts for calculating similarity metrics for protein sequences and structures:

Protein Sequence and Structure Similarity:
- Levenshtein distance
- Cosine similarity
- KL divergence
File Information:
- Some scripts use generalized file names as we calculated multiple variations of these metrics.
- Ensure data source files are available and directory paths are configured correctly to run the scripts.

3. `Model`

This folder contains the implementation and evaluation of the PS3N model:

snf_implementation/: Implements Similarity Network Fusion (SNF) to combine multiple similarity matrices into a single fused matrix.
similarity_matrix_vector.py: Converts the fused similarity matrix into vectors suitable for model training.
k_fold_cross_validation.py: Performs k-fold cross-validation to evaluate the model.
compare_models.py: Compares the PS3N model's performance with traditional machine learning models such as KNN and SVM.

Running the Code

Clone the repository:

git clone <repository-url>
cd <repository-folder>

Configure paths for data files:
- Download the required data files from the provided Google Drive link.
- Adjust the directory paths in the scripts to match the location of the data files.

Run individual scripts:

Input Parsing:
```
python InputParsing/parse-drugbank.py
```

Similarity Metrics Calculation:

python SimilarityMetrics/<script-name>.py

Model Implementation:
```
python Model/k_fold_cross_validation.py
```

Data Access

Due to space limitations, the dataset is not included in this repository. A link to the data files will be provided on request. The data includes:

DrugBank dataset (XML format)
Processed similarity matrices for protein sequences and structures
Feature Space
Network for new found DDI

TO run the code you need to make sure to update the paths to the datasets in the respective scripts after downloading the files.

Dependencies

Below is a list of core libraries (compatible with Python 3.7) needed to run the code in this repository:

numpy (e.g., >=1.18)
pandas (e.g., >=1.0)
scipy (e.g., >=1.4)
sklearn (e.g., >=0.22)
tensorflow (e.g., >=2.2)
keras (often bundled with TensorFlow 2.x)
networkx (e.g., >=2.4)
biopython (e.g., >=1.76)

Additionally, import statements in the code indicate dependencies such as pairwise2 from Biopython, model_from_json from Keras, etc. Make sure to install them accordingly:

pip install numpy pandas scipy scikit-learn tensorflow networkx biopython

License

This project is licensed under the GNU General Public License v3.0. See the LICENSE file for more details.

Citation

If you use this code, please cite our paper:

@inproceedings{islam2021detecting,
  title={Detecting drug-drug interactions using protein sequence-structure similarity networks},
  author={Islam, Saminur and Abbasi, Ahmed and Agarwal, Nitin and Zheng, Wanhong and Doretto, Gianfranco and Adjeroh, Donald A},
  booktitle={2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={3472--3477},
  year={2021},
  organization={IEEE}
}

Contact

For questions or clarifications, feel free to contact:

Saminur Islam: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Input Parsing		Input Parsing
Similarity Metrics Calculation		Similarity Metrics Calculation
Visualization		Visualization
model		model
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ddi-ps3n

PS3N: Leveraging Protein Sequence-Structure Similarity for Novel Drug-Drug Interaction Discovery

Project Overview

Repository Structure

1. `Input Parsing`

2. `Similarity Metrics Calculation`

3. `Model`

Running the Code

Data Access

Dependencies

License

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

License

saminur/ddi-ps3n

Folders and files

Latest commit

History

Repository files navigation

ddi-ps3n

PS3N: Leveraging Protein Sequence-Structure Similarity for Novel Drug-Drug Interaction Discovery

Project Overview

Repository Structure

1. Input Parsing

2. Similarity Metrics Calculation

3. Model

Running the Code

Data Access

Dependencies

License

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `Input Parsing`

2. `Similarity Metrics Calculation`

3. `Model`

Packages