Computational Methods to Explore Big Bioassay Data for Better Compound Prioritization
Bioassay data represent an extremely valuable source of experimental Big Data with rich content that have been substantially produced in the early stages of drug discovery for testing chemical compound bioactivities and identifying promising drug candidates. However, the power of such Big bioassay data has not been fully unleashed, particularly for the purposes of discovering novel knowledge and improving drug development. This is largely due to the fact that the exploration of a much larger space of bioassays has been fundamentally hindered by the less developed ability to identify and utilize the relations across bioassays. In this project, the PI and her team will develop novel computational methods and tools that can effectively explore a wide range of heterogeneous bioassays, identify experimentally unrevealed relations among them, and utilize the novel knowledge derived from them so as to improve compound prioritization. The research will bring scientific impacts and shed light on fully utilizing the existing wealth of Big Data, stimulating knowledge distillation in innovative manners, establishing visionary conceptual hypotheses and developing novel analytical techniques correspondingly. This research aims to solve critical problems in drug discovery through Big Data means, and has a great potential to improve drug candidate identification through accurate compound prioritization, and thus it will have far-reaching economic and societal impacts.
The PI and her team will develop a computational framework to produce better compound ranking for each bioassay. This framework will consist of a local structure learning component and a global structure learning component to discover and leverage the compound ranking within a bioassay and ranking relations across bioassays, respectively. They will also develop new methods to better rank compounds under a combination of criteria. In particular, they will solve compound ranking based on activity and selectivity simultaneously by leveraging ranking difference across bioassays. The research will be innovative, both in terms of employing original computational models and methods into important problems in drug discovery, and in terms of developing unique methodologies and computational techniques for core Computer Science research. For drug discovery, the research will provide novel perspectives and methodologies as to how researchers can utilize the large-scale experimental data to solve important problems in drug discovery. For core Computer Science, the research will contribute a new solution framework and methods spanning the areas of data mining and machine learning. Specifically, the research will lead to novel methods for boosting ranking performance by actively including additional data, incorporating relevant information within a regularized optimization framework, deploying iterative procedures and greedy strategies for large-scale problems with multiple simultaneous tasks, etc. All these methods are generalizable to a variety of other Computer Science applications.
Multi-assay-based compound prioritization
Junfeng Liu and Xia Ning*. Multi-assay-based compound prioritization via assistance utilization: a machine learning framework. Journal of Chemical Information and Modeling, 2017. [ bib ]
Abstract: Effective prioritization of chemical compounds that show promising bioactivities from compound screenings represents a first critical step toward identifying successful drug candidates. Current development on computational approaches for compound prioritization is largely focused on devising advanced ranking algorithms that better learn the ordering among compounds. However, such methodologies are fundamentally limited by the scarcity of available data, particularly when the screenings are conducted at a relatively small scale over known promising compounds. Instead, in this work, we explore the structures of bioassay space and leverage such structures to improve ranking performance of an existing strong ranking algorithm. This is done by identifying assistance bioassays and assistance compounds intelligently and leveraging such assistance within the existing ranking algorithm. By leveraging the assistance bioassays and assistance compounds, the data scarcity can be properly compromised. Along this line, we develop a suite of assistance bioassay selection methods and assistance compound selection methods. Our experiments demonstrate an overall 8.34% improvement on the ranking performance over the state of the art.