Fix bug in checking for duplicate Mutation Records#1099
Merged
averyniceday merged 7 commits intoJan 17, 2024
Merged
Conversation
3 tasks
averyniceday
reviewed
Jan 8, 2024
| recordsToAnnotate.add(cvrUtilities.buildCVRMutationRecord(snp, sampleId, somaticStatus)); | ||
| MutationRecord to_add = cvrUtilities.buildCVRMutationRecord(snp, sampleId, somaticStatus); | ||
| recordsToAnnotate.add(to_add); | ||
| addRecordToMap(to_add); |
Collaborator
There was a problem hiding this comment.
Think this logic will fix the issue, but have you run an entire mskimpact fetch with this code? I'm curious about the memory usage, worried that if we have another map tracking the entire mutation record, we'll exacerbate the memory issue
404d9b9 to
8eb02eb
Compare
sheridancbio
pushed a commit
to sheridancbio/cmo-pipelines
that referenced
this pull request
Feb 5, 2024
…1099) * Check if mutationRecord is duplicated before annotating * Populate mutationMap in loadMutationRecordsFromJson * add addRecordToMap * Remove comments, add local vars for debugging * Remove duplicate MAF variants for AZ * Fix remove-duplicate-maf-variants call * revert whitespace change
sheridancbio
pushed a commit
to sheridancbio/cmo-pipelines
that referenced
this pull request
Feb 9, 2024
…1099) * Check if mutationRecord is duplicated before annotating * Populate mutationMap in loadMutationRecordsFromJson * add addRecordToMap * Remove comments, add local vars for debugging * Remove duplicate MAF variants for AZ * Fix remove-duplicate-maf-variants call * revert whitespace change
sheridancbio
pushed a commit
to mandawilson/cmo-pipelines
that referenced
this pull request
Mar 27, 2024
author Manda Wilson <1458628+mandawilson@users.noreply.github.com> 1703199176 -0500 committer Robert Sheridan <sheridan@cbio.mskcc.org> 1711560265 -0400 upgrade to java 21 switch to genome-nexus-annotation-pipeline that uses new maf repo updated to spring 6, spring batch 5, spring boot 3 to match cbioportal fix typos Updates to AZ-MSKIMPACT to integrate with CDM (knowledgesystems#1098) Fix bug in checking for duplicate Mutation Records (knowledgesystems#1099) * Check if mutationRecord is duplicated before annotating * Populate mutationMap in loadMutationRecordsFromJson * add addRecordToMap * Remove comments, add local vars for debugging * Remove duplicate MAF variants for AZ * Fix remove-duplicate-maf-variants call * revert whitespace change updates for migrating darwin and crdb to java11 (knowledgesystems#1080) pom changes for pulling moved dependencies changes to java args to silence warnings Co-authored-by: cbioportal import user <cbioportal_importer@pipelines.cbioportal.mskcc.org> Remove Annotated MAF before Import (knowledgesystems#958) * remove annotated MAF to prevent duplicate * Update subset_and_merge_crdb_pdx_studies.py --------- Co-authored-by: Avery Wang <averyjwang@gmail.com> Script to combine arbitrary files (knowledgesystems#1104) * Script to combine arbitrary files * Modify unit tests to work with script changes * Remove unnecessary column specifier * Fix syntax bug Add sophia script (knowledgesystems#1105) * Add sophia script * rename transpose_cna file * Add filter-clinical-arg-functions script * Add az var to correct automation environment * Add correct path to transpose_cna script * Call seq_date function * Add seq_date before filtering columns * syntax fix * Fix call to filter out clinical attribute columns * Fix nonsigned out file path * Automate folder name * directory fixes * remove quotes? * change date formatting * output filepath for duplicate variants script * use az_msk_impact_data_home var * move sophia_data_home to automation environment * Add comments * Change dir structures in sophia script to match new repo structure * Add git operations * Remove test file * Fix dirs for sophia zip command * remove quotes * Zip files before cleanup * move zip step before git push Add script for merging Dremio/SMILE into cmo-access (knowledgesystems#1102) - adds cfdna clinical and timeline data from dremio/SMILE - converts patient identifiers using "dmp over cmo" identifier logic from dremio - dremio patient id mapping table export code called to produce mapping table - main script then calls update_cfdna_clinical_sample_patient_ids_via_dremio.sh - merge.py used to combine clinical data from dremio with clinical data from cmo-access - metadata headers added using new script : merge_clinical_metadata_headers_py3.py - other import process flow (similar to other import scripts) followed - error detection step added after debugging for sporadic data loss in results Co-authored-by: Manda Wilson <1458628+mandawilson@users.noreply.github.com> Modify preconsume script to work on one cohort at a time (knowledgesystems#1107) Call correct function name add options for logging in for different accounts Preconsume archer-solid-cv4 and add fetch loop (knowledgesystems#1129) * Handle archer-solid-cv4 samples * Add loop * move each cohort to its own dir and fix filename switch to genome-nexus-annotation-pipeline that uses new maf repo use updated genome-nexus-annotation-pipeline update version of cmo-pipelines to 1.0.0 Convert BatchConfiguration to new Spring Batch format drop unneeded dependency from redcap removed gdd, updated crdb and ddp batch configs to spring batch 5 removed commons-lang start of converting cvr to spring batch 5 fix cvr fetcher BatchConfiguration fixed redcap pipeline spring batch 5 configuration make spring-batch-integration match batch version Co-authored-by: Manda Wilson <1458628+mandawilson@users.noreply.github.com> drop darwin fetcher (and docs/scripts)
mandawilson
pushed a commit
to mandawilson/cmo-pipelines
that referenced
this pull request
Mar 27, 2024
…1099) * Check if mutationRecord is duplicated before annotating * Populate mutationMap in loadMutationRecordsFromJson * add addRecordToMap * Remove comments, add local vars for debugging * Remove duplicate MAF variants for AZ * Fix remove-duplicate-maf-variants call * revert whitespace change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
isDuplicateRecord()check inCVRMutationDataReader,CVRNonSignedoutMutationDataReader, andGMLMutationDataReader. Duplicate records were not being caught by this check because the map being used to check for duplicates was not being populated. TheaddRecordToMap()function now adds to themutationMapand the duplicate check functions correctly.remove-duplicate-maf-variants.pyscript provided by the curation team. Functionality of script described here. It is now called in theupdate-az-mskimpact.shscript