Deep learning for sequence design with a few data points

Seminar Details
Wednesday, September 14, 2022 - 4:00pm to 5:00pm


Andrew White, PhD
Assoc. Prof. of Chemical Engineering, University of Rochester


Department of Computational Medical and Bioinformatics (CCMB) Seminar 

Deep learning has begun a renaissance in chemistry and materials. We can devise and fit models to predict molecular properties in a few hours and deploy them in a web browser. We can create novel generative models that were previously PhD theses in an afternoon. In my group, we’re exploring deep learning in peptides. We are focused on two major problems: interpretability and data scarcity. Now that we can make deep learning models to predict any molecular property ad naseum, what can we learn? I will discuss our recent efforts on interpreting deep learning models through symbolic regression and counterfactuals. Data scarcity is a common problem in biochemistry: how can we learn new properties without significant expense of experiments? One method is in judicious choice of experiments, which can be done with active learning. Another approach is self-supervised learning and constraining symmetries, which both try to exploit structure in data. I will cover recent progress in these areas.