Sam Way graduated from the Bioinfo lab in August of 2012 with a Master of Science in Electrical Engineering. His thesis is entitled "Classification of Genomic Sequences by Latent Semantic Analysis" and describes methods for differentiating and organizing large collections of sequence data. Sam's research also includes studies of computational genomic signatures, fragment assembly routines, and information visualization techniques.
Latent Semantic Analysis + Genomics
Sam's master's thesis approaches problems from computational genomics using latent semantic analysis (LSA), a theory and collection of techniques born out of the fields of natural language processing and information retrieval. In the same way that LSA is traditionally used to identify groups of semantically similar text documents, Sam's research identifies groups of genomic sequences with similar biological features.
Exploratory Information Visualization
nSpect is an information visualization tool for the exploration (or inspection) of high dimensional biological data. The tool is considered an "exploratory" tool in the sense that it can be used to discover hidden patterns and relationships that are hidden in raw data. nSpect was motivated by another project in which our group needed a fast and efficient method of locating misclassified sequences in a particular taxonomy database. Within the visualization, dissimilar items repel one another, forming clusters of similar objects. As such, nSpect quickly reveals incorrectly labeled sequences like those shown in the figure below.