A phylogeny is a hypothesis of the evolutionary relationships between a group of organisms. Phylogenetics is the field of building, analysing, and studying phylogenies. Want to know more? Click start to begin!
Gustave Doré — Divina Commedia. Inferno canto 13°
A revolutionary idea
Phylogenetics is actually a pretty young field. To do it on any kind of large scale, requires computers, and it's only really since the 1970s that the study of evolutionary relationships has kicked off – and to an extent, taken over from other forms of taxonomy. Sometimes this shift is called the cladistic revolution. You can find a link to a super-duper interesting paper about how that happened in the bonus materials below.
Either way, now, in the 21st century, this is a vital part of your toolkit if you want to think about palaeontology, evolution, or life sciences. So what are we going to cover?
Introduction
Our sections for this week are as follows.
The tree of life – Section 1.
Describing phylogenies – Section 2.
Homology and homoplasy – Section 3.
Phylogenetic characters – Section 4.
Phylogenetic inference – Section 5.
By the end of this page, I hope you will agree with me that phylogenetics is a vital part of our understanding of evolution. Trees, and evolutionary relationships, feature in everything from epidemiology (it helps us to understand how infectious agents are related to each other, say) to conservation efforts (is extinction risk influenced by phylogeny?). But in order to be able to use them, we have to be able to build and interpret them – and understand uncertainty in a phylogenetic framework. We'll build on those themes through this page.
1 – The tree of life
Cool cool. So, let's start with the basics. What is phylogenetics, how does it differ from taxonomy, and how do we read a phylogeny?
Summary
Taxonomy is how we classify organisms. Systematics is a field where we do so through evolutionary relationships.
Cladistics is a field in which we try and derive evolutionary relationships through characters.
A clade is a group of organisms including a common ancestor and all descendents.
A phylogeny is a hypothesis about the evolutionary history of a group of organisms, often presented as a cladogram.
Cladograms work on many scales, from just three species, all the way up to all life!
2 – Describing phylogenies
Sometimes, science can be a bit like:
In order to be able to communicate with other scientists about phylogenies, though, we do need to cover a little more vocabulary. Sorry! But rather than avoid it, let's just jump right in.
Summary
There are lots of words associated with describing a phylogeny – both bits of the tree, and relationships of groups.
Everything that is alive today has been evolving for the same length of time. There is no such thing as "more advanced" in phylogenetics.
It's all rather egalitarian, isn't it? Vive la révolution.
Some taxonomic ranks do not form clades. We're gradually moving away from these.
There are some terms (stem-group, crown-group) palaeontologists in particular spend a lot of time using, as they relate to fossil species.
3 – Homology and homoplasy
So that's all lovely. Words, and concepts. Good. Here are two more.
Summary
Homology is similarity due to shared ancestry between in different taxa.
Homoplasy is similarity due to convergent evolution between taxa.
We can identify both through their structure, and position, or using the fossil record or embryology.
We use outgroups to establish the polarity of characters (i.e. what was the original, or plesiomorphic, condition).
4 – Phylogenetic characters
Phylogenies require data. Phylogenetic datasets are made of characters. Let's meet some now.
Summary
Cladistics assumes that characters change as lineages evolve, and that character similarities and differences reflect evolutionary history.
Characters can be molecular (i.e. RNA or DNA bases), or morphological.
If the latter, they have to be formulated based on observations of anatomy.
These can be contingent on each other, and can be informative or uninformative.
5 – Phylogenetic inference
We've learned about our data, let's finish by looking at how we get from this, to a phylogeny. This is the process of phylogenetic inference.
Summary
We can use parsimony to build trees – that is choose the tree that implies the least character transitions.
We have to use tree searches to find this, the most parsimonious tree. If there is more than one, we summarise these in a consensus.
We can use support values to quantify uncertainty.
Alternatively, we can use probabilistic approaches – for example, Bayesian inference – to build trees.
Beware systematic missing data!
Bonus stuff!
Well done on reaching the end of phylogeny. Excellent work! Here are some bonus materials if you're interested.
The cladistic revolution
If you are a fan of the history of science, and the messy human bits about how change happens, look no further. this paper, reference below, focusses on just this question, albeit in the world of human evolution.
We only really covered parsimony trees today. I've linked this in the bonus materials before as an example of molecular clocks, but the mechanisms we use for a molecular clock (Bayesian phylogenetics) we also use to derive trees. So, if you want an explanation of how Bayesian approaches to phylogeny work, in their basic form, look not further:
If you enjoyed the phylogeny of trilobites and want to learn a little more about these animals, look no further. This website covers the basics of their palaeobiology and fossil record:
Want to try out that R code to make the trees I put on the slides?
#R code used to create the 3 tip trees:
library(ggtree); library(ggplot2); library(ape); tree <- read.tree(text = "((Pan:5,Homo:5):2,Gorilla:7);");
layout(matrix(1:2, 1, 2, byrow = TRUE)); plot(tree,label.offset = 0.2); plot(tree, "c", FALSE, label.offset = 0.2);
#R code used to create the 5 tip trees:
library(ggtree); library(ggplot2); library(ape); library(patchwork);
#Generate random tree with taxon labels
set.seed(22102022); taxa = c("A","B","C","D","E"); tree <- rtree(5, tip.label = taxa);
# Plot
ggtree(tree, branch.length='none') + geom_tiplab(hjust = .5, offset =.1) + coord_flip() +
ggtree(tree, branch.length='none', layout="circular") + geom_tiplab(offset =.1) +
ggtree(tree, branch.length='none', layout="daylight") + geom_tiplab() +
ggtree(tree, branch.length='none', layout="equal_angle") + geom_tiplab() + plot_layout(nrow = 1);
Want to try out a parsimony search?
Here is code that I have used to create a bunch of trees for the practical exercise:
#Load packages
library(ggtree); library(ggplot2); library(ape); library(TreeSearch);
library(TreeTools); library(phytools); library(phangorn);
#Function to concatenate data matrices
cat_data <- function(m1, m2){
taxa <- sort(unique(c(names(m1), names(m2))))
new_data <- list()
for(i in taxa){
for(j in 1:2){
#check that each unique taxon occurs in each matrix. Add missing block if not.
if(i %in% names(get(paste("m", j, sep = ""))) == F){
x <- rep("?", length(get(paste("m", j, sep = ""))[[1]]))
} else {
x <- unlist(get(paste("m", j, sep = ""))[i])
names(x) <- NULL
}
new_data[[i]] <- c(new_data[[i]], x)
}
}
return(new_data)
}
#Do this for each nexus file
matrix<-ReadAsPhyDat('fileName.nex') # Load data using TreeTools
bestTrees <- MaximizeParsimony(matrix) # Do parsimony search using TreeSearch - record MPTs
allTrees <- bestTrees #Record all trees to allow a consensus at end
par(mar = rep(0.25, 4), cex = 0.75) # Make plot easier to read
svg('output.svg') # Save plot as image
plot(ape::consensus(bestTrees)) # Plot consensus of MPTs
dev.off()
#Make majority rules consensus of all trees
majRuleTree<-consensus(allTrees,p=0.5)
svg('majRuleConsensus.svg') #Save plot as image
plot(ape::consensus(majRuleTree))
dev.off()
#Concatenate data by looping through all files, loading data and adding to dataframe
matrixFiles <- list.files(getwd(),pattern = ".nex", full.names = TRUE);
concatenatedData<-list(as.list(as.data.frame(ReadAsPhyDat(matrixFiles[1]))))
matrixFiles <- matrixFiles[-1];
for(matrixFile in matrixFiles)
{
concatenatedData<-cat_data(concatenatedData,as.list(as.data.frame(ReadAsPhyDat(matrixFile))))
}
# A lazy way to convert a dataframe back to a PhyDat format
write.nexus.data(concatenatedData,'concatenatedData.nex');
concatenatedData<-ReadAsPhyDat('concatenatedData.nex');
concatenatedBestTrees <- MaximizeParsimony(concatenatedData) #Tree Search
par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
svg('concatenatedTree.svg') # Plot
plot(ape::consensus(concatenatedBestTrees))
dev.off()
#Map branch lengths
concatenatedBestTrees2<-concatenatedBestTrees # Create new tree object
acctranTree<-acctran(concatenatedBestTrees2,concatenatedData) # Do branch lengths
svg('concatenatedTreewBranchlengths.svg') # Plot
plot(acctranTree)
dev.off()