Research Projects at Dundar Lab

Dr. Murat Dundar
Associate Professor
Computer & Information Science Dept.

HOME

COURSES

PROJECTS

PUBLICATIONS

PEOPLE

ACTIVE PROJECTS:

Mapping Stomach Autonomic Circuitry and Function for Neuromodulation of Gastric Disorders (funded by NIH)
A Quantitative Approach to Understanding the Distribution and Diversity of Key Water-Formed Minerals on Mars (funded by NASA)
Quantitative SRS Imaging of Cancer Metabolism at Single Cell Level (funded by NIH/NCI. PI: Ji-Xin Cheng)

COMPLETED PROJECTS:

Self-adjusting Models as a New Direction in Machine Learning (funded by NSF CAREER)
Machine-learning Approach to Label-free Detection of new Bacterial Pathogens (funded by NIH/NIAID)
Automated Spectral Data Transformations and Analysis Pipeline for High Throughput Flow Cytometry (funded by NIH/NIBIB)

Mapping Stomach Autonomic Circuitry and Function for Neuromodulation of Gastric Disorders (funded by NIH. PI: Terry Powley)

Project period: 4/1/2020 - 7/31/2022

Our lab oversees axon segmentation research of the SPARC project at Purdue. We implement deep convolution neural nets for instance-based segmentation of fibers in TEM images. We explore different architectures, consider different training configurations and loss functions, develop a strategy to more effectively select a representative training image tiles, and in collaboration with domain experts we conduct evaluation studies to demonstrate the benefits of this segmentation in reducing the expert's workload. Click the below link to download our code, data, and pretrained models.

TEM Segmentation Project

A Quantitative Approach to Understanding the Distribution and Diversity of Key Water-Formed Minerals on Mars (funded by NASA. PI: Bethany L. Ehlmann)

Project period: 7/1/2019 - 6/30/2022

The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has been one of the key imaging instruments on board the Mars Reconnaissance Orbiter that has helped in the discovery of a broad array of aqueous minerals on the surface of Mars. Hyperspectral data collected by CRISM have revolutionized our understanding of the planet and have been instrumental in the selection of landing sites for Mars rover exploration missions. Although the aging instrument is near the completion of its mission, tens of thousands of images collected to date will no doubt continue to play a major role in humankind's pursuit for signs of life and habitability on the red planet. In this project, my lab is responsible for implementing a new machine learning toolkit for advanced CRISM processing to improve the community's ability to map discrete compositional units in remote-sensing data to more accurately identify mineral phases on Mars. The toolkit contains Python scripts, pixel-scale training data collected from dozens of well-characterized images, and a documentation illustrating use cases of the algorithms on several test images. In this toolkit we tackle two specific tasks by machine learning, namely, bland region identification and mineral classification at pixel scale. The first task requires robust estimation of the distribution of bland pixels and the second task requires a generalizable supervised classifier.

CRISM Machine Learning Toolkit

Quantitative SRS Imaging of Cancer Metabolism at Single Cell Level (funded by NIH/NCI. PI: Ji-Xin Cheng)

Project period: 8/1/2018 - 7/31/2021

In this work we propose a weakly-supervised deep non-exhaustive semantic segmentation for hyperspectral SRS images. Our approach starts the training process with coarse-grained labels obtained by simple PCA-based processing. As these coarse labels only provides "weak" supervision and identify obvious spectral patterns, the learning has to be performed so that class separability is preserved not only between these coarsely labeled classes but also between potential subclasses of these classes as well. To achieve this goal we train the network using the sum of reconstruction and discrimination losses. We replace the standard cross-entropy loss by a newly proposed contrastive loss function that is less sensitive to class labels than the cross-entropy loss so that in the learned feature space class separability information among unknown classes can be relatively well-preserved. Our contrastive loss function is based on the idea of maximizing the number of samples that falls on the right side of the margin from each labeled classes. Once the feature space is learned hyperspectral pixel data are projected into this space, and clustered by a doubly non-parametric Bayesian clustering technique. This clustering technique can accommodate clusters with arbitrary shapes and automatically infers the number of spectral patterns from the data. Segmentation maps obtained using cluster labels for each pixel along with average spectra for each cluster show that our approach can discover a diverse set of biologically significant spectral patterns in SRS images with little supervision obtained by PCA processing.

CAREER: Self-adjusting Models as a New Direction in Machine Learning

Project period: 3/1/2013 - 2/28/2018

Abstract:

Traditional supervised learning algorithms assume that the list of classes defined by a training data set is exhaustive and that new data samples originate from one of the existing classes represented in the training data set. This assumption is not very realistic in many real-world domains as the data-generating mechanisms constantly evolve and new classes of interest emerge on a continual basis. Under such circumstances it is impractical if not impossible to define a training data set with a complete set of classes. When the training data set is not exhaustively defined, a future sample of a class not represented in the training data set will be misclassified with certainty, leading to an ill-defined classification problem.

This study offers a new direction for supervised learning that relaxes the fixed-model assumption defined by the existing data in order to have a self-adjusting model that can evolve by dynamically adding new classes to better accommodate prospective data in offline as well as online settings. Specifically, the aims of the project include (1) studying non-parametric prior models to dynamically model the number of classes (2) developing new online and offline inference techniques in partially-observed settings (3) modeling the rapidly accumulating nature of samples evident with emerging classes (4) automatically associating a newly discovered class with higher-level groups of classes in an attempt to identify potentially interesting class formations, and (5) developing partially-observed tree models containing observed and unobserved nodes, where observed nodes represent existing classes and unobserved nodes are introduced online to fill the gaps in the existing data hierarchy that become evident only with the arrival of new data.

The broader consequences of this work will extend to following areas: 1. Bio-security and bio-surveillance: The developed algorithms could become highly instrumental in implementing a real-time intelligent bio-warning platform for identifying national outbreaks as early as possible. 2. Information retrieval: This study introduces a whole new approach to indexing documents, by an evolving vocabulary that may lead to more efficient indexing with significantly improved relevancy, consistency, and timeliness. 3. Remote sensing: This new framework may become essential for fully exploiting the wealth of spectral information available in hyper-spectral images, allowing for in-depth and high-level image analysis of scenes with dynamic and distinct characteristics. The project also opens exciting possibilities to enhance the research and education environment for K-12, undergraduate, and graduate students, giving them an unparalleled opportunity to work on stimulating cross-disciplinary applications that combine computational and life sciences. A workshop and a scientific competition will be organized to raise the awareness of scientific communities concerning the studied problems.

Additional information about the project can be accessed through the project website at https://cs.iupui.edu/~mdundar/career.html

Automated Spectral Data Transformations and Analysis Pipeline for High Throughput Flow Cytometry

Grant number: 5R21EB015707-02
Project period: 7/1/2012 - 6/30/2014
Investigators: Bartek Rajwa (PI), Murat Dundar (co-I), Alex Pothen (co-I)

Abstract:

High-throughput flow cytometry is an emerging cell-analysis and screening technique employed in various fields of life-sciences, including drug discovery and clinical research. One of the major limitations of HT-FC is the lack of robust, rapid, and reproducible tools for data analysis and data mining. The current paradigm of FC analysis does not fit suit the HT format well. Traditionally, FC data are analyzed employing interactive exploratory visualization, which requires preparing a number of 2-D scatter plots that are used by an FC operator or researcher for visual evaluation of sample characteristics. Although the recent interest of computer science and bioinformatics communities in FC has spurred development of automated compensation and gating techniques, the proposed algorithms still follow the traditional analysis pathway (compensation plus gating), and typically attempt to mimic trained human operators in delineating various cell populations defined by the presence of fluorescent markers of varying intensities. Unfortunately, this model is not sustainable when hundreds or thousands of data sets must be processed in real time. This proposed research attempts to radically re-invent the FC data analysis pipeline for high-throughput FC by employing spectral classification approaches to FC data. In the proposed framework the FC data will be modeled as a mixture of signals that can be quantitatively recovered if certain physical and biological constraints describing the experimental system are rigorously followed. We propose a set of algorithms that will allow us first to define and encode the domain knowledge describing the analyzed specimens, subsequently to approximate the concentrations of labels, and from there recover information about the presence or absence of specific phenotypes of interest. The techniques employed will functionally replace two steps in FC data analysis that have traditionally been viewed as separate: compensation and gating. Instead, a new iterative spectral classification process will recover the quantitative characteristc of samples. This will allow for fast and automated extraction of sample features, as well as for mining the collected specimens for similar datasets. The proposed algorithm will be prototyped using R language for statistical computing, and relevant procedures will be made available to other researchers in the field of FC via the Bioconductor project. Upon successful testing and validation using various datasets contributed by collaborators, the classification algorithms will be implemented in PlateAnalyzer, an HT-FC data analysis package developed at Purdue University. PUBLIC HEALTH RELEVANCE: Flow cytometry (FC) is an important single-cell analysis tool employed in various clinical and research applications. The currently used FC data-analysis paradigm utilizes an exploratory, interactive model requiring operators to evaluate samples manually using expertise and experience. This project attempts to build an automated, robust, reproducible, and operator- independent data-analysis system that can be employed for FC data processing and data mining, limiting subjectivity and enhancing the value of FC techniques.

Machine-learning Approach to Label-free Detection of New Bacterial Pathogens

Grant number: 5R21AI085531-02
Project period: 5/1/2010 - 4/30/2012 (one year no-cost extension granted through 4/30/2013)
Investigators: Murat Dundar (PI) and Bartek Rajwa (PI)

Abstract:

Technologies for rapid detection and classification of bacterial pathogens are crucial for securing the food supply. A light-scattering sensor recently developed for real-time detection and identification of colonies of multiple pathogens has shown great promise for distinguishing bacteria cultures at the genus and species level for Listeria, Staphylococcus, Salmonella, Vibrio, and Escherichia Coli. Unlike traditional testing methods, this new technology does not require a labeling reagent or biochemical processing. The classification approach currently used with this technology relies on supervised learning. For an accurate detection and classification of bacterial pathogens, the training library used to train the classifier should consist of samples of all possible forms of the pathogens. Construction of such a training library is impractical if not impossible due to the high mutation rate that characterizes some of the infectious agents. In this project we propose to advance this sensor technology to allow for the detection of new classes/subclasses of bacteria, which do not exist in the training library. Learning with a non-exhaustive training library is an ill-defined problem. We design a two stage classification scheme to alleviate this problem. The first stage, i.e. detection, identifies whether the bacteria sample belongs to one of the subclasses in the training library or a yet unseen and thus unrepresented subclass. If the former is true, the sample is fed to the second stage, i.e. classification, where it is classified to one of the existing subclasses. If the latter is true, an alert is raised and the sample is saved for follow-up analysis. Benefit for Public Health: Successful implementation of this project will allow for a label-free detection and identification of food pathogens and their mutated subclasses not yet seen earlier. This will reduce the number of food related outbreaks and will help secure our food supply.

Publications:

Ferit Akova, Yuan Qi, Bartek Rajwa, Murat Dundar, “Self-adjusting Models for Semi-supervised Learning in Partially-observed Settings,” In Proceedings of the IEEE International Conference on Data Mining (ICDM’12), Brussels, Belgium, December 10-13, 2012. (To appear as a full paper, acceptance rate: 11%) PDF

Murat Dundar, Ferit Akova, Yuan Qi, Bartek Rajwa, “Bayesian Nonexhaustive Learning for Online Discovery and Modeling of Emerging Classes,” In John Langford and Joelle Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, June 26-July 1, 2012 (pp. 113-120). Omnipress, 2012. PDF

Bartek Rajwa, Murat Dundar, Ferit Akova, Valery Patsekin, Euiwon Bae, Yanjie Tang, J. Eric Dietz, E. Daniel Hirleman, J. Paul Robinson, Arun K. Bhunia, "Digital microbiology: detection and classification of unknown bacterial pathogens using a label-free laser light scatter-sensing system", Proceedings of SPIE, 8029, May 2011.

Bartek Rajwa, Murat Dundar, Ferit Akova, Amanda Betasso, Valery Patsekin, E. Dan Hirleman, Arun K. Bhunia, J. Paul Robinson, “Discovering unknown: detection of emerging pathogens using label-free light scattering system,” Cytometry Part A, 77A(12):1103–1112, 2010 (PMCID: PMC3224816). PDF

Ferit Akova, Murat Dundar, V. Jo Davisson, E. Daniel Hirleman, Arun K. Bhunia, J. Paul Robinson, Bartek Rajwa, “A Machine-learning Approach for Label-free Detection of Unmatched Bacterial Serovars”, Statistical Analysis and Data Mining Journal, 3(5):289-301, 2010 (PMCID: PMC3230886). PDF