SequenceBio · CNuge · Sep 9, 2025 · Sep 9, 2025 · Sep 9, 2025 · Sep 9, 2025
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -16,4 +16,4 @@ KinformR.Rproj
 .pre-commit-config.yaml
 .Rproj.user
 .git
-.github
+.github
diff --git a/.github/workflows/main.yaml b/.github/workflows/main.yaml
@@ -34,4 +34,6 @@ jobs:
         run: |
           conda create -n test_r r-base r-devtools r-testthat
           conda activate test_r
-          Rscript -e "testthat::test_local()" 
+          Rscript -e "testthat::test_local()"
+#      - name: PreCommit
+#        uses: pre-commit/action@v3.0.1
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Changed
 
 ### Added
+- use of precommit spelling, not making a CI check so as to keep cran compatibility.
 
 ### Fixed
 - linting and spelling errors resolved with pre-commit usage.

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,18 +1,18 @@
 Package: KinformR
 Title: Relationship-Informed Pedigree and Variant Scoring
 Version: 0.1.0
-Authors@R: 
+Authors@R:
     person("Cameron M.", "Nugent", , "cam.nugent@sequencebio.com", role = c("aut", "cre"),
            comment = c(ORCID = "0000-0002-1135-2605"))
 Author: Cameron M. Nugent
 Maintainer: Cameron M. Nugent <cam.nugent@sequencebio.com>
-Description: 
-    The KinformR R package is meant to aid in comparative evaluation of families 
-    and candidate variants in rare-variant association studies. The package can be used for 
-    two methodologically overlapping but distinct purposes. First, the prior to any genetic or genomic 
-    evaluation, evaluation of relative detection power of pedigrees, can direct recruitment 
-    efforts by showing which unsampled individuals would be the most meaningful additions to a study. 
-    Second, after sequencing and analysis,  variants based on association with disease status 
+Description:
+    The KinformR R package is meant to aid in comparative evaluation of families
+    and candidate variants in rare-variant association studies. The package can be used for
+    two methodologically overlapping but distinct purposes. First, the prior to any genetic or genomic
+    evaluation, evaluation of relative detection power of pedigrees, can direct recruitment
+    efforts by showing which unsampled individuals would be the most meaningful additions to a study.
+    Second, after sequencing and analysis,  variants based on association with disease status
     and familial relationships of individuals, aids in variant prioritization.
 License: MIT + file LICENSE
 Encoding: UTF-8
@@ -22,5 +22,5 @@ VignetteBuilder: knitr
 Suggests:
     devtools,
     testthat,
-    knitr, 
+    knitr,
     rmarkdown
diff --git a/R/io.R b/R/io.R
@@ -43,7 +43,7 @@ read.relation.mat <- function(fname){
 #' status encoded in the indivudal's names
 #'
 #' Note - ensure the status in the names match your desired encoding!
-#' There are individuals with ambigious statues, that you may require to
+#' There are individuals with ambiguous statues, that you may require to
 #' be encoded in a specific fashion for you current purposes.
 #'
 #'
@@ -80,6 +80,3 @@ read.var.table <- function(fname){
                      "variant" = in.variants)
   return(out.df)
 }
-
-
-
diff --git a/R/pedigree.r b/R/pedigree.r
@@ -146,7 +146,7 @@ score.pedigree <- function(h){
   for (i in seq_len(nrow(h))) {
     family <- h[i,"Family"]
     max.a <- h[i, "max_a"]
-    #Yeezy yeezy whats good its ya boy
+    #Yeezy yeezy what's good, its ya boy
     max.b <- h[i, "max_b"]
     max.c <- h[i, "max_c"]
     max.d <- h[i, "max_d"]

diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ The development version of `KinformR` can be installed directly from GitHub. You
 ```
 #install.packages("devtools")
 #install.packages("knitr") #required if build_vignettes = TRUE
-#library(devtools) 
+#library(devtools)
 devtools::install_github("SequenceBio/KinformR", build_vignettes = TRUE)
 library(KinformR)
 ```
@@ -25,14 +25,14 @@ library(KinformR)
 
 The package's vignette contains detailed explanations of the functions and parameters.
 
-For a walk through of the `KinformR` functions for scoring the value of *families* based on penetrance and IBD, see the corresponging vignette file: 
+For a walk through of the `KinformR` functions for scoring the value of *families* based on penetrance and IBD, see the corresponding vignette file:
 `vignettes/KinformR-penetrance_and_ibd.Rmd`
 or within R, run:
 ```
 vignette('KinformR-penetrance_and_ibd')
 ```
 
-For a walk through of the `KinformR` functions for scoring the value of *variants* within families, see the corresponging vignette file: 
+For a walk through of the `KinformR` functions for scoring the value of *variants* within families, see the corresponding vignette file:
 `vignettes/KinformR-variant_scoring.Rmd`
 
 or within R, run:
@@ -59,7 +59,7 @@ and scoring then performed:
 
 ## Scoring Variants
 
-When looking at shared rare variants across families, not all sets of affected and unaffected individuals are equal. This R package is designed to score rare variants, assigning values based on the disease status of individuals, the presence or absence of a rare variant in those individuals, and their pairwise coefficients of relatedness. The package uses a custom formula to assign value to a variant that gives more weight to shared variants common to distantly related affected individuals. The variant status for unaffected individuals can optionally be considered as well, with the highest scoring values being given to closely related individuals that *do not* share a variant of interst. Since variants can be incompletely penetrant, the scoring can be based solely on the affected individuals, or the weight of unaffected evidence can be customized.
+When looking at shared rare variants across families, not all sets of affected and unaffected individuals are equal. This R package is designed to score rare variants, assigning values based on the disease status of individuals, the presence or absence of a rare variant in those individuals, and their pairwise coefficients of relatedness. The package uses a custom formula to assign value to a variant that gives more weight to shared variants common to distantly related affected individuals. The variant status for unaffected individuals can optionally be considered as well, with the highest scoring values being given to closely related individuals that *do not* share a variant of interest. Since variants can be incompletely penetrant, the scoring can be based solely on the affected individuals, or the weight of unaffected evidence can be customized.
 
 
 ### The relationship matrix
@@ -89,4 +89,4 @@ The two streams of information can then be combined to score a variant based off
 
 ```
 score.example <- score.fam(rel.mat, ind.df.status)
-```
+```
diff --git a/man/add.fam.scores.Rd b/man/add.fam.scores.Rd
diff --git a/man/calc.rv.score.Rd b/man/calc.rv.score.Rd
diff --git a/man/ibd.Rd b/man/ibd.Rd
diff --git a/man/penetrance.Rd b/man/penetrance.Rd
diff --git a/man/read.pedigree.Rd b/man/read.pedigree.Rd
diff --git a/man/read.var.table.Rd b/man/read.var.table.Rd
diff --git a/man/score.Rd b/man/score.Rd
diff --git a/man/score.fam.Rd b/man/score.fam.Rd
diff --git a/man/score.pedigree.Rd b/man/score.pedigree.Rd
diff --git a/man/subset.mat.Rd b/man/subset.mat.Rd
diff --git a/tests/testthat/test_encoding.R b/tests/testthat/test_encoding.R
@@ -37,10 +37,10 @@ test_that("Families are correctly encoded.", {
   expect_equal(scores$statvar.cat, expected.scores)
 
   print("theoretical.max high score values for a family")
-  ther.scores <- score.variant.status(indiv.df, theoretical.max=TRUE)
+  theory.scores <- score.variant.status(indiv.df, theoretical.max=TRUE)
 
   expected.thermax.scores <- c("A.c","U.c","A.c","A.c","A.c" ,"U.c", "A.c", "U.c")
-  expect_equal(ther.scores$statvar.cat, expected.thermax.scores)
+  expect_equal(theory.scores$statvar.cat, expected.thermax.scores)
 
 
 })
diff --git a/vignettes/KinformR-penetrance_and_ibd.Rmd b/vignettes/KinformR-penetrance_and_ibd.Rmd
@@ -3,7 +3,7 @@ title: "KinformR - penetrance and idb informed scoring of families"
 author: "Cameron M. Nugent"
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 data: "`r Sys.Date()`"
-output: rmarkdown::pdf_document # rmarkdown::html_vignette # 
+output: rmarkdown::pdf_document # rmarkdown::html_vignette #
 pdf_document:
   df_print: kable
 vignette: >
@@ -37,12 +37,12 @@ show <- function(df){
 The family power calculations depend on a single tab-delimited input file, where each row represents a family. The input file is read in using the `read.pedigree` function.
 
 ```{r}
-example.pedigree.file <- system.file('extdata/example_pedigree_encoding.tsv', 
+example.pedigree.file <- system.file('extdata/example_pedigree_encoding.tsv',
                                      package = 'KinformR')
 
 example.pedigree.df <- read.pedigree(example.pedigree.file)
 ```
-The input file is expected to have the following 11 columns (with a header). 
+The input file is expected to have the following 11 columns (with a header).
 ```{r}
 colnames(example.pedigree.df)
 
@@ -51,7 +51,7 @@ colnames(example.pedigree.df)
 ### Simplified summary of pedigrees
 
 For now this file should be be constructed through careful manual inspection of the predigrees. To encode the rows for each family, you should first prune down pedigrees to informative allele transfers. For
-the purposes of this tool, we exclude young generations (non-adults, younger than age of onset) and large (more than two sequential generations) trees of exclusively unaffected family members. Additionally all individuals require a binary A/U status, there should be no ambigious individuals.  There will be some judgment calls required here.  
+the purposes of this tool, we exclude young generations (non-adults, younger than age of onset) and large (more than two sequential generations) trees of exclusively unaffected family members. Additionally all individuals require a binary A/U status, there should be no ambiguous individuals.  There will be some judgment calls required here.
 
 
 ### Encoding categories of relationships
@@ -73,11 +73,11 @@ show(example.pedigree.df)
 ```
 
 All columns with the prefix `max_` are meant to count the total number of each category in the pedigree, while
-the columns without this prefix are the number of each category for whom samples have been collected.  
+the columns without this prefix are the number of each category for whom samples have been collected.
 
 The categories correspond to A, B, and C as defined above.
 
-Category D is represented by two numbers, d and n.  n is the number of offspring in a tree of unaffecteds; d is the number of those types of trees across the pedigree.  Multiple types of trees are encoded with commas separating the values.  For example, the following represents a family with three total trees of unaffecteds. One tree (d=1) has three offspring (n=3); two trees (d=2) each have one offspring (n=1). 
+Category D is represented by two numbers, d and n.  n is the number of offspring in a tree of unaffecteds; d is the number of those types of trees across the pedigree.  Multiple types of trees are encoded with commas separating the values.  For example, the following represents a family with three total trees of unaffecteds. One tree (d=1) has three offspring (n=3); two trees (d=2) each have one offspring (n=1).
 
 ```
 d   n
@@ -138,6 +138,3 @@ we only count the parent. (d=1, n=0; equivalently, c=1)
 2.  You have collected one or more children, but not the parent.  In this case,
 each of the children contribute a portion of what the parent would have contributed
 to our understanding.  (d=1, n>0)
-
-
-
diff --git a/vignettes/KinformR-variant_scoring.Rmd b/vignettes/KinformR-variant_scoring.Rmd
@@ -3,7 +3,7 @@ title: "KinformR - pedigree-informed rare variant association scoring"
 author: "Cameron M. Nugent"
 date: "`r format(Sys.time(), '%d %B, %Y')`"
 data: "`r Sys.Date()`"
-output: rmarkdown::pdf_document #rmarkdown::html_vignette # 
+output: rmarkdown::pdf_document #rmarkdown::html_vignette #
 pdf_document:
   df_print: kable
 vignette: >
@@ -43,7 +43,7 @@ To read in the data, one uses the function `read.relation.mat`.
 mat.name1<-system.file('extdata/1234_ex2.mat', package = 'KinformR')
 rel.mat <- read.relation.mat(mat.name1)
 show(rel.mat)
-``` 
+```
 
 
 ### The status file
@@ -60,15 +60,15 @@ tsv.name1<-system.file('extdata/1234_ex2.tsv', package = 'KinformR')
 status.df <- read.indiv(tsv.name1)
 
 show(status.df)
-``` 
+```
 
 The disease-genotype scoring can then be encoded using the `score.variant.status` function to produce the status-variant category for all individuals. This creates a df with the new column: `statvar.cat`.
 
 ```{r}
 
 full.df.status <-  score.variant.status(status.df)
 show(full.df.status)
-``` 
+```
 
 
 
@@ -80,7 +80,7 @@ For most real-world applications, you will likely want to score family members i
 
 ex.score.default <- score.fam(rel.mat, full.df.status)
 show(ex.score.default)
-``` 
+```
 
 
 By default `score.fam` returns:
@@ -92,26 +92,26 @@ As previously noted, if an individual is present in the relationship matrix and
 The scoring can be changed to summing across all combinations as opposed to the mean by passing the following options. Note using the program in this way will return higher scores for more dense pedigrees.
 ```{r}
 
-ex.score.sum <- score.fam(rel.mat, full.df.status, 
+ex.score.sum <- score.fam(rel.mat, full.df.status,
                           return.sums = TRUE, return.means = FALSE)
 show(ex.score.sum)
-``` 
+```
 
 
 To obtain a long form table with the scores for variants expressed relative to each individual, set both `return.sums` and `return.means` to `FALSE`. This output can aid in identifying which individuals are carrying the most weight in a family's score.
 ```{r}
 
-ex.score.table <- score.fam(rel.mat, full.df.status, 
+ex.score.table <- score.fam(rel.mat, full.df.status,
                             return.sums = FALSE, return.means = FALSE)
 show(ex.score.table)
-``` 
+```
 
-## How scoring works 
+## How scoring works
 ### A Minimal example, scoring a variant from perspective of a single individual.
 
-This section is meant to demonstrate how the variant scoring is accomplished on a finer scale. A user does not need to interact with the package on this level of granularity. This section is for explanatory purposes only, demonstrating how the `score.fam` function operated "under the hood". 
+This section is meant to demonstrate how the variant scoring is accomplished on a finer scale. A user does not need to interact with the package on this level of granularity. This section is for explanatory purposes only, demonstrating how the `score.fam` function operated "under the hood".
 
-The `score.fam` function runs the scoring method once for each affected individual in the status dataframe (or for each individual regardless of status if `affected.only = FALSE`). To do this, for each individual, the program takes corresponding row of the relationship matrix to determine the relations to all other individuals in the pedigree.  
+The `score.fam` function runs the scoring method once for each affected individual in the status dataframe (or for each individual regardless of status if `affected.only = FALSE`). To do this, for each individual, the program takes corresponding row of the relationship matrix to determine the relations to all other individuals in the pedigree.
 
 For example, the degrees of relationships of all other members of the example family relative to the reference individual `"MS-1234-1001"` are show in the following subset of the matrix:
 
@@ -135,25 +135,25 @@ name.stat.dict
 ```{r}
 rel.dict<-build.relation.dict(rel.mat.proband, name.stat.dict)
 rel.dict
-``` 
+```
 In this example, the proband, two first degree relations, and a third degree relations are all affected and share the candidate variant. For the affected correct (`A.c`) category we therefore see the following encoded:
 
 ```{r}
 rel.dict$A.c
-``` 
+```
 
 Since one first degree unaffected relative has the variant, they are categorized as "unaffected incorrect"(`U.i`) and we see:
 ```{r}
 rel.dict$U.i
-``` 
+```
 
 Deriving a relatedness-weighted score for the variant from the perspective of the given individual is then performed by `calc.rv.score`
 
-For each degree-encoded relationship, the coefficient of relatedness is used to weight the evidence for or against a variant. The coefficients for different degress of relationship are:
+For each degree-encoded relationship, the coefficient of relatedness is used to weight the evidence for or against a variant. The coefficients for different degrees of relationship are:
 ```{r}
 for(i in 0:7){
-    print(paste0("Degree of relatedness: ", i, 
-                 " coefficient of relatedness: ",  1 / (2 ** (i))))  
+    print(paste0("Degree of relatedness: ", i,
+                 " coefficient of relatedness: ",  1 / (2 ** (i))))
 }
 ```
 
@@ -200,19 +200,17 @@ The final score for the variant would then be:
 ```
 Giving a final score of 10 for the variant.
 
-This is all accomplished by the function `calc.rv.score`. 
+This is all accomplished by the function `calc.rv.score`.
 ```{r}
 calc.rv.score(rel.dict)
-``` 
+```
 
 The weights of the scoring can be adjusted, for example if we wanted to consider only `affected`-based evidence, we could turn off the unaffected part of the calculation by setting the unaffected weighting to 0. This can be useful for incompletely penetrant variants, where disease status and genotype of unaffected individuals are more likely to have imperfect concordance.
 
 Additionally, families with low numbers of affected individuals sequenced and high number of unaffected individuals may haved inflated variant scores and potentially be misleading, focusing the scoring algorithm on the affected individuals only can overcome this bias.
 
 ```{r}
 calc.rv.score(rel.dict, unaffected.weight=0)
-``` 
+```
 
 The `score.fam` function automatically walks through this process from all specified perspectives in the pedigree and by default returns the average score. The use of the averages and different perspectives is meant to eliminate pedigree-associated bias, such as for instances when a proband is distantly related to all other members in a family (considering the relationships from only the perspective of the proband in this case would give an inflated score for the variant's value).
-
-
-Original file line number
+Diff line change
@@ Expand Up / @@ -16,4 +16,4 @@ KinformR.Rproj @@
     .pre-commit-config.yaml
     .Rproj.user
     .git
-    .github
+    .github