Phylogeny

A revolutionary idea

Phylogenetics is actually a pretty young field. To do it on any kind of large scale, requires computers, and it's only really since the 1970s that the study of evolutionary relationships has kicked off – and to an extent, taken over from other forms of taxonomy. Sometimes this shift is called the cladistic revolution. You can find a link to a super-duper interesting paper about how that happened in the bonus materials below.

Either way, now, in the 21st century, this is a vital part of your toolkit if you want to think about palaeontology, evolution, or life sciences. So what are we going to cover?

Introduction

Our sections for this week are as follows.

The tree of life – Section 1.
Describing phylogenies – Section 2.
Homology and homoplasy – Section 3.
Phylogenetic characters – Section 4.
Phylogenetic inference – Section 5.

By the end of this page, I hope you will agree with me that phylogenetics is a vital part of our understanding of evolution. Trees, and evolutionary relationships, feature in everything from epidemiology (it helps us to understand how infectious agents are related to each other, say) to conservation efforts (is extinction risk influenced by phylogeny?). But in order to be able to use them, we have to be able to build and interpret them – and understand uncertainty in a phylogenetic framework. We'll build on those themes through this page.

1 – The tree of life

Cool cool. So, let's start with the basics. What is phylogenetics, how does it differ from taxonomy, and how do we read a phylogeny?

Summary

Taxonomy is how we classify organisms. Systematics is a field where we do so through evolutionary relationships.
Cladistics is a field in which we try and derive evolutionary relationships through characters.
A clade is a group of organisms including a common ancestor and all descendents.
A phylogeny is a hypothesis about the evolutionary history of a group of organisms, often presented as a cladogram.
Cladograms work on many scales, from just three species, all the way up to all life!

2 – Describing phylogenies

Sometimes, science can be a bit like:

In order to be able to communicate with other scientists about phylogenies, though, we do need to cover a little more vocabulary. Sorry! But rather than avoid it, let's just jump right in.

Summary

There are lots of words associated with describing a phylogeny – both bits of the tree, and relationships of groups.
Everything that is alive today has been evolving for the same length of time. There is no such thing as "more advanced" in phylogenetics.
It's all rather egalitarian, isn't it? Vive la révolution.
Some taxonomic ranks do not form clades. We're gradually moving away from these.
There are some terms (stem-group, crown-group) palaeontologists in particular spend a lot of time using, as they relate to fossil species.

3 – Homology and homoplasy

So that's all lovely. Words, and concepts. Good. Here are two more.

Summary

Homology is similarity due to shared ancestry between in different taxa.
Homoplasy is similarity due to convergent evolution between taxa.
We can identify both through their structure, and position, or using the fossil record or embryology.
We use outgroups to establish the polarity of characters (i.e. what was the original, or plesiomorphic, condition).

4 – Phylogenetic characters

Phylogenies require data. Phylogenetic datasets are made of characters. Let's meet some now.

Summary

Cladistics assumes that characters change as lineages evolve, and that character similarities and differences reflect evolutionary history.
Characters can be molecular (i.e. RNA or DNA bases), or morphological.
If the latter, they have to be formulated based on observations of anatomy.
These can be contingent on each other, and can be informative or uninformative.

5 – Phylogenetic inference

We've learned about our data, let's finish by looking at how we get from this, to a phylogeny. This is the process of phylogenetic inference.

Summary

We can use parsimony to build trees – that is choose the tree that implies the least character transitions.
We have to use tree searches to find this, the most parsimonious tree. If there is more than one, we summarise these in a consensus.
We can use support values to quantify uncertainty.
Alternatively, we can use probabilistic approaches – for example, Bayesian inference – to build trees.
Beware systematic missing data!

Bonus stuff!

Well done on reaching the end of phylogeny. Excellent work! Here are some bonus materials if you're interested.

The cladistic revolution

If you are a fan of the history of science, and the messy human bits about how change happens, look no further. this paper, reference below, focusses on just this question, albeit in the world of human evolution.

Cartmill, M., 2018. A sort of revolution: Systematics and physical anthropology in the 20th century. American Journal of Physical Anthropology, 165(4), 677-687

How else can we build trees?

We only really covered parsimony trees today. I've linked this in the bonus materials before as an example of molecular clocks, but the mechanisms we use for a molecular clock (Bayesian phylogenetics) we also use to derive trees. So, if you want an explanation of how Bayesian approaches to phylogeny work, in their basic form, look not further:

Garwood, R.J. 2020. Patterns in Palaeontology — Deducing the tree of life. Palaeontology [online] 8(12):1-10.

Want to learn more about trilobites?

If you enjoyed the phylogeny of trilobites and want to learn a little more about these animals, look no further. This website covers the basics of their palaeobiology and fossil record:

Trilobites website.

Want to try out that R code to make the trees I put on the slides?


                    #R code used to create the 3 tip trees:
                    library(ggtree); library(ggplot2); library(ape); tree <- read.tree(text = "((Pan:5,Homo:5):2,Gorilla:7);"); 
                    layout(matrix(1:2, 1, 2, byrow = TRUE)); plot(tree,label.offset = 0.2); plot(tree, "c", FALSE, label.offset = 0.2);
                    
                    #R code used to create the 5 tip trees:
                    library(ggtree); library(ggplot2); library(ape);  library(patchwork);
                    #Generate random tree with taxon labels
                    set.seed(22102022); taxa = c("A","B","C","D","E"); tree <- rtree(5, tip.label = taxa);
                    # Plot
                    ggtree(tree, branch.length='none') + geom_tiplab(hjust = .5, offset =.1) + coord_flip() + 
                    ggtree(tree, branch.length='none', layout="circular") + geom_tiplab(offset =.1) +
                    ggtree(tree, branch.length='none', layout="daylight") + geom_tiplab() +
                    ggtree(tree, branch.length='none', layout="equal_angle") + geom_tiplab() + plot_layout(nrow = 1);

Want to try out a parsimony search?

Here is code that I have used to create a bunch of trees for the practical exercise:


                    #Load packages
                    library(ggtree); library(ggplot2);  library(ape); library(TreeSearch); 
                    library(TreeTools); library(phytools); library(phangorn);
                    
                    #Function to concatenate data matrices
                    cat_data <- function(m1, m2){
                      taxa <- sort(unique(c(names(m1), names(m2))))
                      new_data <- list()
                      for(i in taxa){
                        for(j in 1:2){
                          #check that each unique taxon occurs in each matrix. Add missing block if not.
                          if(i %in% names(get(paste("m", j, sep = ""))) == F){
                            x <- rep("?", length(get(paste("m", j, sep = ""))[[1]]))
                          } else {
                            x <- unlist(get(paste("m", j, sep = ""))[i])
                            names(x) <- NULL
                          }
                          new_data[[i]] <- c(new_data[[i]], x)
                        }
                      }
                      return(new_data)
                    }
                    
                    #Do this for each nexus file
                    matrix<-ReadAsPhyDat('fileName.nex') # Load data using TreeTools
                    bestTrees <- MaximizeParsimony(matrix) # Do parsimony search using TreeSearch - record MPTs
                    allTrees <- bestTrees #Record all trees to allow a consensus at end
                    par(mar = rep(0.25, 4), cex = 0.75) # Make plot easier to read
                    svg('output.svg') # Save plot as image 
                    plot(ape::consensus(bestTrees)) # Plot consensus of MPTs
                    dev.off()
                    
                    #Make majority rules consensus of all trees
                    majRuleTree<-consensus(allTrees,p=0.5)
                    svg('majRuleConsensus.svg') #Save plot as image
                    plot(ape::consensus(majRuleTree))
                    dev.off()
                    
                    #Concatenate data by looping through all files, loading data and adding to dataframe
                    matrixFiles <- list.files(getwd(),pattern = ".nex", full.names = TRUE);
                    concatenatedData<-list(as.list(as.data.frame(ReadAsPhyDat(matrixFiles[1]))))
                    matrixFiles <- matrixFiles[-1]; 
                    for(matrixFile in matrixFiles)
                    {
                      concatenatedData<-cat_data(concatenatedData,as.list(as.data.frame(ReadAsPhyDat(matrixFile))))
                    }
                    # A lazy way to convert a dataframe back to a PhyDat format
                    write.nexus.data(concatenatedData,'concatenatedData.nex'); 
                    concatenatedData<-ReadAsPhyDat('concatenatedData.nex');
                    concatenatedBestTrees <- MaximizeParsimony(concatenatedData) #Tree Search
                    par(mar = rep(0.25, 4), cex = 0.75) # make plot easier to read
                    svg('concatenatedTree.svg') # Plot
                    plot(ape::consensus(concatenatedBestTrees))
                    dev.off()
                    
                    #Map branch lengths
                    concatenatedBestTrees2<-concatenatedBestTrees # Create new tree object
                    acctranTree<-acctran(concatenatedBestTrees2,concatenatedData) # Do branch lengths
                    svg('concatenatedTreewBranchlengths.svg') # Plot
                    plot(acctranTree)
                    dev.off()