Abstract: Phylogenetic (evolutionary tree) inference is a key tool for understanding evolutionary systems. This includes viral adaptation and genomic epidemiology, as well as the antibody response to infection and vaccination. Bayesian phylogenetic analysis allows us to assess and integrate out tree uncertainty to obtain more reliable estimates of other model variables of interest (e.g. transmission rates). However, Bayesian posterior distributions on phylogenetic trees remain difficult to sample despite decades of effort. The complex discrete and continuous model structure of trees means that recent inferential methods developed for Euclidean space are not easily applicable to the phylogenetic case. Thus, we are left with random-walk Markov Chain Monte Carlo (MCMC) with uninformed tree modification proposals; these traverse tree space slowly because phylogenetic posteriors are concentrated on a small fraction of the very many possible trees.
In this talk, I will describe our work to design new scalable approaches to inferring the Bayesian posterior on phylogenetic trees. This includes establishing a new discrete inferential target, which we call the "subsplit directed acyclic graph," and a new algorithm that will allow us to infer this structure using methods analogous to much faster maximum-likelihood (point-estimate) methods for phylogenetics. I will also describe how, once this structure is in hand, we can perform variational inference via stochastic gradient descent.
Dr. Frederick “Erick” Matsen is an expert in computational biology, which is the science of using biological data to develop computer algorithms, or programs, to understand biological systems and relationships. His research team has developed new methods to analyze data generated by sequencing the DNA of viruses, immune cells and complex environmental samples containing many microorganisms. The team also pursues more abstract questions about the methods used to construct evolutionary trees. Another focus of Dr. Matsen’s work is on improving software used in computational biology, both by developing open source tools and by contributing to work on larger, collaborative projects.