Skip to content

bbi-lab/RAD51D_XRCC2_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

646 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAD51D_XRCC2_analysis

This repository contains all code and notebooks used to generate figures used in the RAD51D/XRCC2 SGE paper (Casadei et al. 2026).

Required supplementary data tables and paths are already set for all notebooks. To regenerate figures, simply clone this repository. For figures generated through Python scripts, paths will need to be updated based on your local machine.

Data

The data folder contains sub-directories containing all supplementary tables, and data needed to generate figures. Data used to recompile provided supplementary data tables is also provided.

extra_data

Contains all external data used for analysis and figure generation.

The case_control_data sub-folder contains data accessed from the BRIDGES and CARRIERS breast cancer case-control studies. Data for each study is organized into its respective folder:

  • BRIDGES_data - Contains all data files from the BRIDGES study

    • 20250815_BRIDGES_missense_all.xlsx - All missense variants sequenced in the BRIDGES study
    • 20250815_BRIDGES_missense_population.xlsx - Missense variants only from population-based studies that were part of the BRIDGES study
    • 20250815_BRIDGES_PTVs_all.xlsx - All PTVs sequenced in the BRIDGES study
    • 20250815_BRIDGES_PTVs_pop.xlsx - PTVs only from population-based studies that were part of the BRIDGES study
  • CARRIERS_data - Contains data file from the CARRIERS study

    • 20250303_CARRIERS_data.xlsx - Contains all variants sequenced in the CARRIERS case-control study.

final_tables

Contains all final supplementary tables. Includes final tables for figure generation and key residue analysis.

  • supplementary_file_1_RAD51D_SGE_final_table_20260407.xlsx - Final score file for RAD51D. Contains additional tabs with additional orthogonal data and metadata (i.e. variant counts and editing rates)
  • supplementary_file_1_XRCC2_SGE_final_table_20260122.xlsx - Analogous final score file for XRCC2.
  • supplementary_file_RAD51D_XRCC2_keyresis_20260407.xlsx - Residues from RAD51D_XRCC2_InterestingResidues.xlsx merged with final SGE scores

supp_table_inputs

Files used to create the RAD51D and XRCC2 final supplementary tables are found in the supp_table_inputs sub-directory. Files are dated based on when they were originally created/when data was accessed.

RAD51D files:

  • 20251106_RAD51D_RegeneronMAF.csv - MAF data from Regeneron's Million Exome study for RAD51D (accessed 2025/11/06)
  • 20251106_RAD51D_gnomAD_v4.1.0.csv - MAF data for RAD51D SNVs accessed from gnomAD v4.1.0 (accessed 2025/11/06)
  • 20251107_RAD51D_PhyloP.xlsx - PhyloP scores across RAD51D
  • 20260102_RAD51Dsnvs_VEP.xlsx - Ensembl VEP annotated SNV file for RAD51D
  • 20260102_RAD51Dsnvscores.vcf - .VCF file input for Ensembl VEP annotation of RAD51D SNVs
  • 20260102_RAD51Dvep.txt - Raw VEP output for RAD51D SNVs
  • 20260407_RAD51D.editrates.tsv - Editing rates generating useable reads for each RAD51D SGE target and replicate
  • 20260407_RAD51Dallscores.tsv - Raw RAD51D score file
  • 20260407_RAD51Ddelcounts.tsv - Raw counts for RAD51D deletions
  • 20260407_RAD51Dmodelparams.tsv - Output parameters from GMM modeling for RAD51D
  • 20260407_RAD51Dsnvcounts.tsv - Raw counts for RAD51D SNVs
  • RAD51D_unpublished.json - Points thresholds for running ExCALIBR (PMID: 40654914) calibration on RAD51D SGE data

XRCC2 files:

  • 20251107_XRCC2_RegeneronMAF.csv - MAF data from Regeneron's Million Exome study for XRCC2 (accessed 2025/11/07)
  • 20251107_XRCC2_gnomAD_v4.1.0.csv - MAF data for XRCC2 SNVs accessed from gnomAD v4.1.0 (accessed 2025/11/07)
  • 20251107_XRCC2_PhyloP.xlsx - PhyloP scores across XRCC2
  • 20251107_XRCC2.editrates.tsv - Editing rates generating useable reads for each XRCC2 SGE target and replicate
  • 20251202_XRCC2allscores.tsv - Raw XRCC2 score file
  • 20251202_XRCC2delcounts.tsv - Raw counts for XRCC2 deletions
  • 20251202_XRCC2modelparams.tsv - Output parameters from GMM modeling for XRCC2
  • 20251202_XRCC2snvcounts.tsv - Raw counts for XRCC2 SNVs
  • 20260102_XRCC2snvs_VEP.xlsx - Ensembl VEP annotated SNV file for XRCC2
  • 20260102_XRCC2snvscores.vcf - .VCF file input for Ensembl VEP annotation of XRCC2 SNVs
  • 20260102_XRCC2vep.txt - Raw VEP output for XRCC2 SNVs
  • XRCC2_unpublished.json - Points thresholds for running ExCALIBR (PMID: 40654914) calibration on XRCC2 SGE data

Shared files:

  • 20251231_DX2_OrthogonalData.xlsx - Curated list of variants previously assayed in orthogonal assays
  • 20260101_SGEsubset.xlsx - Subset of SGE data from the annotated dataframe used in https://doi.org/10.64898/2026.02.14.705848

Notebooks

The Notebooks folder contains Python Notebooks that create individual panels used in the final figures and the supplementary tables. The visualization that will be created by each notebook is noted in the name of the notebook. Code in notebooks has been annotated and each contains a Markdown header describing the figure or table it produces.

These notebooks are:

  • BCDX2_MakeVCF - Generates .VCF file used as the input to Ensembl's VEP tool to get variant effect predictor annotations for AlphaMissense, REVEL, CADD, and SpliceAI.
  • BCDX2_CalibrationFig - Generates plot highlighting number of variants in each evidence points bin after ExCALIBR calibration (Fig. 5e-f and Extended Data Fig. 11)
  • BCDX2_ClinVar_analysis - Generates strip plots and ROC-AUC plots for benchmarking generated SGE data against cataloged ClinVar variants (Fig. 5a-d)
  • BCDX2_Correlation_analysis - Pearson r correlation heatmap of counts (Extended Data Fig. 1 c & f)
  • BCDX2_Darrah_Heatmap - Heatmap of MAVE scores from Darrah et al. 2025 for biochemically interesting variants (Extended Data Fig. 8)
  • BCDX2_EditRate_BarPlot - Bar plots displaying proportion of usable reads (Extended Data Fig. 1a & d)
  • BCDX2_InteractingResidues - Builds heatmaps at biochemically key RAD51D and XRCC2 residues (Figure 4e, Extended Data Fig. 8a)
  • BCDX2_MakeFinalDataTable - Builds the RAD51D and XRCC2 final supplementary tables — the required input for all figure generating notebooks.
  • BCDX2_OrthogonalAnalysis - Strip plots comparing variant function in orthogonal biochemical assays performed by Darrah et al. 2025 to SGE fitness scores (Fig. 2c-d)
  • BCDX2_RAD51D_XRCC2Heatmap - Stacked amino acid-level heatmap for RAD51D and XRCC2 (Fig. 2a-b)
  • BCDX2_ScoresAcrossGene - Scatter plot of fitness scores across the coding sequence of the gene (Extended Data Fig. 2)
  • BCDX2_StackedHistos - Stacked histogram and strip plots (Fig. 1d-h)
  • BCDX2_VEPs_vs_SGE - Scatter plot of VEP scores vs. fitness score (Extended Data Fig. 4)

Scripts

The Scripts folder contains scripts used to generate figures in PyMOL and ChimeraX. Scripts are labeled by the figure that will be generated and code is annotated.

These scripts are:

  • BCDX2_colorChimeraX_MIS_only - Colors ribbon cartoon protein structures using missense SGE scores in ChimeraX.
  • BCDX2_RAD51C_PyMOL - Generates colored ribbon cartoon or surface for RAD51C using data from Olvera-Leon et al. 2024 in PyMOL.
  • BCDX2_RAD51C_ChimeraX - Analogous figure generated for RAD51C as BCDX2_RAD51C_PyMOL but in ChimeraX.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors