Model Selection and Model Fit

This repository contains tutorials that explore ways to select phylogenetic models for your data and then analyze how well those models fit the data.

First, we'll use RevBayes to select among substitution models using Bayes factors. The tutorial is available here.

We can use Bayes factors to select models in many contexts beyond only substitution models. We might use them to select between concatenated and multispecies coalescent models or alternative clock models in a divergence time analysis. Here we'll use them to select a partition model. A tutorial that demonstrates how to do this is posted here.

As we have seen in the lecture slides, the number of possible partition models grows rapidly as the size of a dataset grows. For large datasets, it can be impractical to use Bayes factors to investigate many possible partition models. We can use alternative and faster strategies to accomplish something similar. IQTree will allow you to do this, selecting both a reasonable partition model and substitution models for each partition. You can easily run this analysis in IQTree for the example files using the command:

 iqtree -s concat.nex -spp partition.nex -m MFP+MERGE

The concat.nex files contains the same four simulated loci that we just estimated Bayes Factors for, and the partition.nex file simply specifies where the locus boundaries in the concatenated alignment. After running this command, open the partition.nex.best_scheme.nex file, which contains the optimal partitioning scheme that IQtree found. You can read more about this functionality here.

The above exercises demonstrate how to measure relative model fit, which asks 'What model, among a set of candidates, fits my data the best?' It is of course possible that all of the models in our set of candidates fit the data poorly. In this case, we would simply be choosing the best of a bad bunch without necessarily realizing it. To avoid this, we often want to measure how well the model describes the data in an absolute sense. To do this in a Bayesian framework, we can use posterior prediction. A tutorial introducing posterior prediction in RevBayes for a non-phylogenetic example is available here. Once you understand the fundamentals of posterior prediction in a general sense, these tutorials will show you how to do this for phylogenetic analyses using the P³ pipeline in RevBayes for both data based and inference based test statistics.

As a final exercise, we can bring together many of the things that we have been working on to assess the fit of our partition model. We'll use the model identified by IQtree (1:GTR, 2:HKY+G, 3,4:JC). You'll need to set up a revscript that:

reads in the four loci
sets up an appropriate partitioning scheme
assigns the correct substitution model to each gene
run a P³ analysis for this model

Does the model seem to be an adequate description of the data?

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
2_partition_model_selection		2_partition_model_selection
1_revbayes_model_selection_bayes_factors_bf_subst_model.zip		1_revbayes_model_selection_bayes_factors_bf_subst_model.zip
2_partition_model_selection.zip		2_partition_model_selection.zip
3_revbayes_model_testing.zip		3_revbayes_model_testing.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Model Selection and Model Fit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

IntroPhylogenomics/ModelSelectionAndFit

Folders and files

Latest commit

History

Repository files navigation

Model Selection and Model Fit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages