Skip to content

Refactor internals for speed, fix bugs, bump to 1.8.7#29

Merged
bobverity merged 1 commit into
mainfrom
refactor/efficiency-and-bugfixes
Jun 30, 2026
Merged

Refactor internals for speed, fix bugs, bump to 1.8.7#29
bobverity merged 1 commit into
mainfrom
refactor/efficiency-and-bugfixes

Conversation

@bobverity

Copy link
Copy Markdown
Collaborator

Internal refactor with no change to documented behaviour (verified against a golden-master snapshot of every exported function), removing duplicated code and large per-string dplyr overhead. Biggest speedups in drop_read_counts and position_from_variant_string (now pure string operations).

Bug fixes:

  • get_component_variants() no longer errors on valid strings that combine a phased het locus with a homozygous locus (e.g. pfcrt:72_73:C|S_V).
  • check_position_string() now validates every distinct input, not just the first n_unique original elements (invalid strings after duplicates were being skipped).
  • check_variant_string()/check_position_string() now carry the failure reason in the error condition (conditionMessage) rather than an empty stop().

Behaviour changes:

  • drop_read_counts() and position_from_variant_string() preserve the input format rather than re-normalising it (position ranges and concise amino-acid notation are kept; genes are still sorted alphabetically).

Other:

  • Moved test-case CSVs from inst/extdata to tests/testthat/testdata and read them with readr::read_csv (strips the UTF-8 BOM that had silently broken the validator tests); added guards so fixture-load failures are loud.
  • Removed the unused tidyr dependency.

Internal refactor with no change to documented behaviour (verified against a
golden-master snapshot of every exported function), removing duplicated code
and large per-string dplyr overhead. Biggest speedups in drop_read_counts and
position_from_variant_string (now pure string operations).

Bug fixes:
- get_component_variants() no longer errors on valid strings that combine a
  phased het locus with a homozygous locus (e.g. pfcrt:72_73:C|S_V).
- check_position_string() now validates every distinct input, not just the
  first n_unique original elements (invalid strings after duplicates were
  being skipped).
- check_variant_string()/check_position_string() now carry the failure reason
  in the error condition (conditionMessage) rather than an empty stop().

Behaviour changes:
- drop_read_counts() and position_from_variant_string() preserve the input
  format rather than re-normalising it (position ranges and concise amino-acid
  notation are kept; genes are still sorted alphabetically).

Other:
- Moved test-case CSVs from inst/extdata to tests/testthat/testdata and read
  them with readr::read_csv (strips the UTF-8 BOM that had silently broken the
  validator tests); added guards so fixture-load failures are loud.
- Removed the unused tidyr dependency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bobverity bobverity merged commit 53cb0a2 into main Jun 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant