--- title: 'Lab 07: Intro to phylogenetics in R' author: "Carrie Tribble, modified by Ixchel González-Ramírez" date: "March/03/2020" output: html_document: df_print: paged word_document: default pdf_document: default header-includes: \usepackage{color} --- ```{r setup, include=F} knitr::opts_chunk$set(echo = TRUE, eval = F) library(knitr) library(tidyverse) ``` # Integrative Biology 200 # Principles of Phylogenetics # University of California, Berkeley In this lab we will introduce some of the basic packages and functions for running comparative analyses in R. This is an R Markdown document. You can run the code by copying from the PDF into your R Console or you can open the .Rmd file and run the lines of code in each 'code chunk'. If you open the .Rmd file, you can make edits and save your very own PDF with your own additions and answers to the question. When you click the **Knit** button (on the top of your sript window) a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. If you choose to modify the markdown file, please remove the \texttt{include=FALSE} line from the r setup code chunk. This way I'll see the output of your code in your final knitted document. Also change the author name to your name.If you chose to copy and paste the commands to a new script document, remember the good practice of comenting your script using the \textt{#} symbol. Before beginning this lab, please set your working directory to an appropriate folder on your computer, for example: ```{r} setwd("/Users/ixchel/Documents_local/IB_200/labs/07") ``` ## Part 1: Basic Tree Functions and Data Structure There are many packages in R that are commonly used for phylogenetic comparative methods. To avoid having to individually install each package, this will install all relevant packages at once. This will take a bit, but you'll be ready to run all the phylogenetic analyses you could ever want! ```{r, warning=FALSE, message=FALSE, eval = FALSE} install.packages("ctv") library(ctv) install.views("Phylogenetics") ``` First, we will go over some basic tree functions in R. We will need to load some of the packages you just installed. **R-trick:** In R \texttt{install.packages()} download the packages to your computer. But to use the functions on the packages you need to use the function \texttt{library()} to load the package in your current session. For example: ```{r} library(ape) ``` Now let's make a tree! You can specify a tree (with branch lengths) using the newick parenthetical format: ```{r} tree <- read.tree(text = "(((A:0.2,B:0.1):0.3,(C:0.5,D:0.1):0.2):0.1,E:0.5);") plot(tree, edge.width=2) ``` Because we specified branch lengths in the tree file, this is a phylogram rather than a cladogram. We can extract those branch lengths using the following function: ```{r} tree$edge.length ``` We can also examine the structure of the data object. When working in R, the \texttt{str} command can be helpful, especially when you are dealing with new data formats such as phylogenies. ```{r} str(tree) ``` Let's look at how nodes and tips are labeled in this data structure. This can be helpful if you are trying to identify a particular tip or node of your tree to remove or to label. ```{r} tree$edge ``` The edge matrix contains the beginning and ending node number for all the nodes and tips in the tree. By convention, the tips of the tree are numbered 1 through n for n tips; and the nodes are numbered n + 1 through n + m for m nodes. m = n - 1 for a fully bifurcating tree. This is just to keep track of which nodes are internal and which are leaves. This will make more sense if we plot the labels on the tree itself. ```{r} plot(tree, edge.width = 2, label.offset = 0.1, type = "cladogram") nodelabels() tiplabels() ``` Next, we we will load some sample data. Load the following packages, and the sample dataset on Teleost fishes: ```{r, warning = FALSE, message = FALSE} library(adephylo) library(phylobase) data(mjrochet) ``` We have now downloaded a file containing a tree and associated trait data. Take a look at the structure of the mjrochet object. Then save the tree as a separate object. Then you can plot the tree to see what we've downloaded. ```{r, fig.height = 8, fig.width = 5} str(mjrochet) teleost_tree <- read.tree(text=mjrochet$tre) plot(teleost_tree) ``` Examine a list of the tips and choose one to reroot the tree with. Feel free to choose a different taxa to reroot the tree with. ```{r, eval = FALSE} teleost_tree$tip.label teleost_reroot <- root(teleost_tree, "Lutjanus_purpureus") ``` Now, let's plot both the original and rerooted trees to compare ```{r, fig.height = 8, fig.width = 5} plot(teleost_tree, main = "Original") plot(teleost_reroot, main = "Rerooted") ``` Let's plot some character data on the tree. We can link the trait data with the tree by making a Phylo4D object. This is a fancy way of saying that we are creating a new data structure in R where the trait data is associated with the tips of the tree object. We will get the character matrix from the same mjrochet object where we got the tree. Finally, we can plot all these characters next to the tips of the tree. ```{r} teleost_4d <- phylo4d(x=teleost_tree, tip.data=mjrochet$tab) table.phylo4d(teleost_4d, cex.lab=.5,show.node=FALSE) ``` This tutorial by Liam J. Revell has some great pointers for plotting characters on trees (for discrete AND continuous traits): http://www.phytools.org/Cordoba2017/ex/15/Plotting-methods.html. We'll see some other methods later today and in next week's lab. ## Part 2: Phylogenetics of the taco When it comes to comparative analysis, R is a very powerful tool. Let's take the phylogenetics of tacos as an example for you to get an idea of what R can do! We will start by reading an incomplete phylogeny of Mexican taco-like food: ```{r} #read Phylogeny of tacos from a string taco_tree <- read.tree(text="(((((Taco_de_carnitas, Taco_de_guisado), Taco_de_barbacoa), (Taco_al_pastor, Taco_de_suadero)),(enchiladas, chilaquiles)), ((gordita, tlacoyo),(sope, huarache)));") ``` Now let's plot the tree: ```{r} #plot the tree plot(taco_tree, edge.width=2) ``` Let's take a minute here to explore some of the display options you can use in R. A general advantage of R graphics over GUI software is that (with enough patience) you will be able to customize your figures. ```{r, fig.heith=5} #We can make rounded trees roundPhylogram(taco_tree) ``` ```{r} #we can make an unrooted tree: plot(unroot(taco_tree),type="unrooted",lab4ut="axial", edge.width=2) tiplabels() nodelabels() ``` ```{r} #we can make an unrooted tree: plot(taco_tree,type="fan", cex=0.5) ``` ## \textcolor{Blue}{Exercise A} *** > Burritos and quesadillas are Mexican food, aren't they? They are not included in this phylogeny. Modify the string of the tree to include "quesadilla" and "burrito" where you think they fit better. ```{r} #insert your code here ``` *** ## \textcolor{Blue}{Exercise B} *** > Now let's assign some character states to the terminals. First we extract the list of terminals, then we assign the characters and we make a graph. ```{r, fig.height = 10, fig.width = 7, message=FALSE} terminals <- taco_tree$tip.label #terminal names #number of tortillas num_tortillas <- as.matrix(c(2,2,2,2,2,3,3,1,1,1,1)); row.names(num_tortillas)=terminals #making a graph dotTree(taco_tree, num_tortillas, legend = F, data.type= "discrete", colors=setNames(c("blue","red", "yellow"), c("1", "2", "3"))) #dot.legend(x=0, y=-4,prompt=FALSE) #add.simmap.legend(x=0, y= -4, leg = c("1", "2", "3"), colors = c("blue","red", "yellow"), prompt = F) ``` ```{r, fig.height = 8, fig.width = 5, message=FALSE} #Tortilla thickness: 0 being a thin and 1 thick tortilla_thickness <-as.matrix(c(0,0,0,0,0,0,0,1,1,1,1)); row.names(tortilla_thickness)=terminals #making a graph dotTree(taco_tree, tortilla_thickness, legend = F, data.type= "discrete", colors=setNames(c("blue","red"), c("0", "1"))) ``` > Now is your turn, using the two past examples, code a trait (deliciousnes??, spiciness?) and male a plot of that trait ```{r,fig.height = 8, fig.width = 5, message=FALSE } #Insert your code here ``` *** That's all for now on traits, on next labs we will cover trait evolution using more adhoc packages. The last very powerful function I want to show you today is creating trees according to a model. For example lets create a random tree with 50 terminals and plot it: ```{r} ## Creating a random tree with 100 terminals random_tree<- rtree(n=50, rooted = T) ##plot it plot(random_tree, edge.width = 1, cex=0.6) ``` You can create trees under different assumptions, this is simulating data. And it's a very powerful tool for model developping. Send me your modified .Rmd file Some content in this lab is drawn from the IB200 2016 lab by Will Freyman and the [Phytools blog](http://blog.phytools.org/)