Skip to content

Random test data #48

Description

@molpopgen

From our last meeting, we can do something like the following:

1. pick a number of alleles
2. pick frequencies for the alleles, for example by randomly sampling probabilities for a multinomial distribution
3. sample alleles accordingly

We can augment this approach by applying a missing data rate. After sampling alleles, we simply "mask" each [genotype|allele] as missing by sampling a uniform [0, 1) and asking if it is <= the missing rate.

We should stick this code in a module that is #[cfg(test)] and its members are mostly pub.

In order to be general, I think it would help if what we return first is simple a Vec/iterable over genotypes. We can then process such a thing to make the values for a VCF record, etc..

Getting boundary conditions right is essential but applying crates like proptest should help us identify corner cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions