Documentation for MMAML

Under construction

Introduction

Mutational Mapping Analysis with Maximum Likelihood(MMAML) is a re-implementation of Nielsen 2002 method. It provide a fast approach for analyzing distributions of mutations on the phylogeny. The method involves: 1) Obtain maximum likelihood estimates of the parameters(phylogenetic tree, mutational parameters) using PAML(baseml) at nucleotide level. 2) sample ancestral states of the internal nodes recursively according to their joint probabilities. 3) conditional on the ancestral states, mutational history can be simulated according to the maximum likelihood parameters. 4) for each sampled mutational history, statistical tests can be constructed and posterior average p values can be calculated over many replications.

Please refer to Nielsen 2002 paper for details.

Merits:

This method is really an approximation method. It comes in handy when the dataset is big and codon based methods are computationally too intensive. As shared by all approximation methods, it can get into trouble and give misleading results when the conditions are not "regular". For a wonderful discussion of this subject, please see Yang and Nielsen 2000 .

Potential Features

•  visualizing dn/ds ratios on huge phylogenies(jointly with ATV), see my paper's supplementary materials for an example.

•  analysis of distribution of mutations among sites and lineages

Program under modification

This is THE FIRST sophisticated program I wrote. I made sure everything looks right along the way. For general usage, I need to modify the program to take information about partitions over codons and lineages. I also need to take information from PAML output about the maximum likelihood estimates. Moreover, I need to deal with weird characters/gaps/reading frames... ...

Relatives of this program

PAML is the package you should use for most purposes unless your dataset is HUUUUGE. It is a more reliable/better supported program and is based on more rigorous standard theory. ( I stepped into the field of phylogenetics by running PAML again and again. Once you transformed from hating the program to love the program, starting to order nucleotides in the order of T,C,A,G... ... that means you start to understand something about the field. At the same time, you can still choose to hate PAML yourself as I used to do).

There are several other programs that implemented similar methods. For example, SIMMAP.