Friday, September 13, 2013

Learning something new: Integrating computer vision into my research.

I recently started my sabbatical year, where I get a chance to re-tool myself, and my research knowledge. The game plan is to learn how to integrate computer vision (machine vision) and machine learning approaches into my research, in particular with respect to the study of animal behaviour and the analysis of images (and videos). We study the evolutionary genetics of complex phenotypes in my lab. While this used to (mostly) mean the complex structure of the shape and size of fruit-fly wings, we are moving more into the study of animal behavior. In particular how flies evade and escape being eaten by predators (more on that at a later date).

 The analysis of such data (both huge sets of wing images as well as video, which is effectively a series of images) can be time consuming and what can be done manually is somewhat limited (such as with JWatcher), in particular if you want to do "high throughput" work with many samples. I have over the past few years interacted, and begun to collaborate with scientists who are using all sorts of techniques from computer vision which have amazed me, both with respect to the speed of the analysis, but also the detailed information gleaned from such approaches. So I am trying to get up to speed and see how to utilize these approaches for my own work.

 To that end I will be now posting about this experience (as well as all of the more usual genetics). This will include useful new tidbits, programming scripts, software I have tried (and tutorials), books, and anything else I can think of. Basically my research progress journal for this new endeavour. I hope that this will help me stay nice and organized, and perhaps will be useful more generally. If you start to follow this thread, and have suggestions for anything, please let me know in the comments or on twitter.

More to follow soon!

Our new pre-print: An integrative genomic approach illuminates the causes and consequences of genetic background effects

This is a guest post by Dr. Chris Chandler. Cross posted from Haldane's Sieve.

Biologists have long recognized that a mutation can have variable effects on an organism's phenotype; even introductory genetics classes often make this observation by introducing the concepts of penetrance and expressivity. More mysterious, however, are the factors that influence the phenotypic expression of a mutation or allele. We know, for instance, that introducing the same mutation into two different but otherwise wild-type genetic backgrounds can result in vastly different phenotypes. But what specific differences between these two genetic backgrounds interact with the mutation, and how? And how does gene expression fit into this puzzle? Answering these questions has not been an easy task, which is not too surprising when you realize that penetrance and expressivity are, in reality, complex quantitative traits. We therefore adopted a multi-pronged genetic and genomic approach to tease apart the mechanisms mediating background dependence in a mutation affecting wing development in the fly Drosophila melanogaster.

The phenotypic patterns seen in our model trait have already been characterized: the scalloped[E3] (sd[E3]) mutation has strong effects in the Oregon-R (ORE) background, resulting in a tiny, underdeveloped wing, while its effects in the Samarkand (SAM) background are still obvious but much less extreme, resulting in a blade-like wing.

To try to find out what causes these differences, we generated and combined a variety of datasets: whole-genome re-sequencing of the parental strains and a panel of introgression lines to map the background modifiers of the sd[E3] phenotype; transcription profiling (using two microarray datasets and one RNA-seq-like dataset), including analyses of allele-specific expression in flies carrying a "hybrid" genetic background; predictions of binding sites for the SD protein, which is a transcription factor; and a screen for deletion alleles that enhance or suppress the sd[E3] phenotype in a background-dependent fashion.

Our results point to a complex genetic basis for this background dependence. We found evidence for a number of loci that are likely to modulate the effects of the sd[E3] allele. However, some unexpected inconsistencies provide a cautionary tale for those intending to take a similar mapping-by-introgression approach for their trait of interest: do multiple replicates, and introgress in both directions, or you may inadvertently end up mapping some other trait! Although the number of candidate genes we identified were generally large, by combining those results with data from our other datasets, we were able to narrow our focus to those showing a consistent signal, yielding a robust set of candidate genes for further study. Without getting into too much detail, we also used a novel approach to show that background-dependent modifier deletions of the sd[E3] phenotype (of which there are many) involve higher-order epistatic interactions between the sd[E3] mutation, the deletion, and the genetic background, rather than quantitative non-complementation (so more than two genes were involved).

Overall, we think that an integrative approach like this could be useful for others trying to understand complex traits, including genetic background-dependence of mutations. In addition, if you're a Drosophila researcher working with the commonly used Samarkand or Oregon-R strains, our genome re-sequencing data (raw and assembled), including SNPs, will soon be available in public repositories for genetic data.