We tested package installation with R version 4.5.1. on Ubuntu 24.04, Arch Linux release base-20250817.0.405639, MacOS Sequoia 15.6 and Windows 11.
If using Linux, installation of dependencies may depend on some system level development packages.
On Ubuntu (tested with 24.04 Docker image):
sudo apt install \
libx11-dev git libcurl4-openssl-dev \
libssl-dev make libgit2-dev zlib1g-dev pandoc \
libfreetype6-dev libjpeg-dev libpng-dev \
libtiff-dev libicu-dev libfontconfig1-dev \
libfribidi-dev libharfbuzz-dev libxml2-dev cmake
On Arch Linux (tested with base-20250817.0.405639 Docker image and CachyOS):
pacman -S \
base-devel gcc-fortran git curl openssl make cmake libgit2 zlib-ng-compat \
pandoc freetype2 libxml2 harfbuzz fribidi \
fontconfig libjpeg libpng icu libtiff
Both zlib and zlib-ng-compat work; you might have zlib already installed so you can answer "no" if asked whether you want to replace zlib with zlib-ng-compat.
On MacOS (tested on Sequioa 15.6), package installation should automatically download binaries and thus no additional development libraries should be necessary.
However, if you use homebrew to install R, be sure to use --cask to avoid having to install all packages from source:
brew install --cask r
On Windows (tested on Windows 11), package installation automatically downloads binaries and thus no additional development libraries should be necessary.
Using the devtools package, you can install directly from GitHub:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require("pcaMethods", quietly = TRUE))
BiocManager::install("pcaMethods", ask=FALSE)
devtools::install_github("blueprint-genetics/amiss")
If installation with devtools::install_github fails with the error below:
Using github PAT from envvar GITHUB_PAT
Error: Failed to install 'unknown package' from GitHub:
HTTP error 401.
Bad credentials
This suggests that there is an expired or otherwise invalid authentication token on the computer. You can either remove it as discussed on Stack Overflow or create a new, valid authentication token as discussed on GitHub.
In short, you can use
install.packages("gitcreds")
gitcreds::gitcreds_delete()
to remove the existing credentials.
If you are not sure you want to remove the token, you can work also around this by passing the auth_token=NULL parameter to install_github:
devtools::install_github("blueprint-genetics/amiss", auth_token=NULL)
Alternatively, you can download the package manually and then install using install_local:
First clone the repository:
git clone https://github.com/blueprint-genetics/amiss.git
cd amiss
and then in R:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if (!require("pcaMethods", quietly = TRUE))
BiocManager::install("pcaMethods", ask=FALSE)
devtools::install_local(".")
To run AMISS, you need annotated ClinVar variant data.
The framework can be configured using JSON files that determine which preprocessing and imputation steps are taken.
The file combination_orig.json in the repository contains parameters that largely match the original code of the manuscript. combination_minimal.json produces the smallest dataset for processing and uses only zero imputation, and thus should be quickest to run.
For different options available for parameter values, see parameter_grid.json.
To run the framework with a single set of parameters up to computation of the classification statistics (note: this may take a long time):
library(amiss)
library(magrittr)
create_dir("output")
# Parse
S01_parse_vcf("clinvar_20190624.vep.vcf", cadd_snv_filename = "CADD_clingen.tsv", cadd_indel_filename = "CADD_clingen_indel.tsv", output_root_dir = "output/data", parameters_path = "combination_minimal.json")
# Preprocess
S02_preprocess_data("output/data/quality-clingen.restriction-missense.transcript-canonical", parameters_path = "combination_minimal.json", output_path = "output/data", seed=10)
# Imputation and classifier training
impute_and_train(training_path = "output/data/preprocessed_training_data.csv",
outcome_path = "output/data/training_outcomes.csv",
output_path = "output/trained",
mice_hyperparameter_grids = mice_hyperparameter_grids,
other_hyperparameter_grids = other_hyperparameter_grids,
single_value_imputation_hyperparameter_grids = single_value_imputation_hyperparameter_grids,
parameter_list=rjson::fromJSON(file = "combination_minimal.json"),
cores = 1, seed = 10, lean = TRUE)
# Prediction
predict_on_test_set(test_path = "output/data/preprocessed_test_data.csv",
training_path = "output/data/preprocessed_training_data.csv",
outcome_path = "output/data/test_outcomes.csv",
tr_output_path = "output/trained",
results_dir_path = "output/results",
parameter_list=rjson::fromJSON(file = "combination_minimal.json"),
seed = 10)
# Result CSVs now in output/results
This project compares methods for handling missing data in variant annotations for the purpose of building variant pathogenicity predictors.
See the paper for a more detailed description.
The result files are available in Zenodo with DOI 10.5281/zenodo.6656616.
This project conforms to the principles of open science:
- Open data:
- We use and reference publically available datasets and will citably archive any data and code we produce using Zenodo with a DOI
- Open source:
- The source code is freely available under the MIT license at GitHub
- Open notebook:
- You can follow development from the start on GitHub at https://github.com/blueprint-genetics/amiss
- The research plan is available in the GitHub repository (see section above for producing a PDF version)
- Open access:
- We will upload a preprint of the resulting paper(s) on biorXiv
- We will submit the results for publication in a peer-reviewed open access journal
- Open communication:
- We intend to present the results in public scientific conferences
- Open collaboration:
We license our code with the MIT license (see the LICENSE file), but note that the license of the entire system may depend on the licenses of individual libraries.