Loading...
Thumbnail Image
Publication

Faster isomer network generation

Thiagarajan, Dheivya
Research Projects
Organizational Units
Journal Issue
Embargo Expires
Abstract
Isomer networks provide a mechanism to understand and interpret relationships between organic molecules with applications in medicinal chemistry and drug design. The extraction of isomer networks is a time and data-intensive computation. The contributions of this dissertation are a variety of techniques to more efficiently (with respect to time and memory) compute isomers networks. Specifically, we describe our efforts to improve the network extraction process by 1) Using the symmetry present in most molecules to reduce run time and memory and streamlining the algorithm used for the detection of duplicate canonical names, a key step in determining the bond count distances between pairs of isomers. Together, these techniques result in reductions in memory of up to 60% and improvements in runtime of up to a factor of 100. 2) Developing an optimal grouping algorithm to subdivide an all-all computation with large memory requirements. The algorithm provides a solution to sub divide the "big data" problem that arises in the construction of isomer networks into several independent "small data" problems. Our results show that using the grouping algorithm can help divide large data sets into independent smaller ones that can be processed in parallel. 3) Generating the isomer network for 1,050,125 isomers of Nicotine (with a preliminary analysis of the same) using the cloud computing capabilities of Amazon Web Services and Microsoft Azure. These techniques can also be employed to successfully compute isomers networks for other chemical compounds.
Associated Publications
Rights
Copyright of the original work is retained by the author.
Embedded videos