Software for finding orthologs by reciprocal BLAST

I started my undergraduate research at an exciting period in Cell and Developmental Biology. The late 1990’s marked a time when biologists truly got a handle on the genome sequencing business, and, at the turn of the millennium, a new genome was being sequenced each year. Much to the delight of urchinologists, the purple sea urchin, Strongylocentrotus purpuratus, was one of many model organisms to make its way through the sequencing pipeline. That’s where I came in…

My work in the lab of Dr. Robert Morris began in my second semester at Wheaton College, MA. The Morris Lab studied the development and motility of cilia, and was part of the Sea Urchin Genome Sequencing Consortium. The Morris Lab was tasked with annotating the cytoskeletal and motility genes via reciprocal BLAST analysis. We spent countless long nights in the lab manually annotating hundreds upon hundreds of sequences. At one point, I could only recite four letters of the alphabet (A, T, C, and G). My highly-intensive work was ultimately published in the journals of Science and Developmental Biology; however, I couldn’t help thinking that there was an easier way to annotate genes for newly sequenced genomes. This idea set the foundation of my honors thesis project.

I soon realized that coding was going to be an integral part of any method to simplify gene annotation. I enrolled in a Perl program course with a focus on bioinformatics. Before the course finished, I was already applying the coding techniques to my research. I developed a reciprocal BLAST pipeline called rBLAST. I was able to transform a three month manual annotation effort into a three hour click-and-walk-away experience. I used rBLAST to generate a putative ciliary gene catalog of over 200 sequences. These newly annotated genes offered the larger scientific community a series of high-profile targets for ciliary disorders such as microcephaly, polycystic kidney disease, retinal disease, deafness, and growth syndromes.