Dr. Murat Dundar
Associate Professor
Computer & Information Science Dept.

 

CAREER: Self-adjusting Models as a New Direction in Machine Learning

Award number: 1252648
Project period: 3/1/2013 - 2/28/2018
 


Abstract:

Machine learning algorithms are now routinely used to build predictive models from data in wide range of applications. However, current approaches to machine learning have an important limitation: They assume that the set of classes observed in a training data set is exhaustive and that new data samples originate from one of the existing classes represented in the training data set. This assumption is unrealistic in many real-world applications in which previously unobserved classes of interest emerge.

This study explores a new class of machine learning algorithms that produce self-adjusting models that can accommodate new classes observed in data in offline as well as online learning scenarios. The project aims to (i) use non-parametric models to dynamically incorporate the changing number of classes; (ii) develop new online and offline inference techniques to accommodate new classes as they emerge (iii) automatically associate newly discovered classes with higher-level groups of classes in an attempt to identify potentially interesting class formations, and (iv) develop partially-observed tree models containing observed and unobserved nodes, where observed nodes represent existing classes and unobserved nodes are introduced online to fill the gaps in the existing data hierarchy that become evident only with the arrival of new data.

The broader impacts of this work could extend several real world applications: Bio-security and bio-surveillance, information retrieval, and remote sensing among others in settings where all of the classes are not known a priori. The educational plan includes outreach to K-12 students and enhanced research opportunities for undergraduate and graduate students in computer science as well as at the intersection of computational and life sciences. All the software, publications, and data sets resulting from the project will be freely disseminated to the larger research and educational community. 

Publications:

  • Halid Z. Yerebakan and Murat Dundar, “Partially Collapsed Parallel Gibbs Sampler for Dirichlet Process Mixture Models,” Pattern Recognition Letters. To appear subject to minor revisions.

  • Bethany Ehlmann and Murat Dundar, "Acidic Conditions During Open System Weathering on Late Noachian/Early Hesperian Mars? Newly Identified Outcrops of Alunite and Jarosite from Orbital CRISM Data,", AAS/Division for Planetary Sciences Meeting Abstracts, 2016.

  • Baichuan Zhang, Murat Dundar, Muhammed Hasan, "Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams," in Proceedings of ACM CIKM, Indianapolis, US, Oct 2016. PDF

  • Murat Dundar and Bethany Ehlmann, "Rare Jarosite Detection in CRISM Imagery by Non-Parametric Bayesian Clustering," in Proceedings of IEEE WHISPERS'16, Los Angeles, US, Aug 2016. PDF
  • Bartek Rajwa, Paul Wallace, Elizabeth Griffiths, Murat Dundar (2015). "Automated Assessment of Disease Progression in Acute Myeloid Leukemia by Probabilistic Analysis of Flow Cytometry Data," IEEE Transactions on Biomedical Engineering, 2016. Online

  • Murat Dundar, Qiang Kou, Baichuan Zhang, Yicheng He, and Bartek Rajwa, “Simplicity of Kmeans versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data,” In Proceedings of IEEE International Conference on Machine Learning Applications, Miami, FL, USA, December 11-13, 2015. PDF

  • Bethany Ehlmann and Murat Dundar, "Are Noachian/Hesperian Acidic Waters Key to Generating Mars' Regional Scale Aluminum Phyllosilicates? The Importance of Jarosite Co-occurrences with Al-Phyllosilicate Units," 46th Lunar and Planetary Science Conference, The Woodlands, TX, March 16-20, 2015 (oral presentation). PDF

  • Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar, "The Infinite Mixture of Infinite Gaussian Mixtures," Advances in Neural Information Processing Systems (NIPS'14), Montreal, Canada, December 8-13, 2014. (acceptance rate: 24.6%) PDF

  • Murat Dundar, Ferit Akova, Halid Z. Yerebakan, Bartek Rajwa, "A Non-parametric Bayesian Model for Joint Cell Clustering and Cluster Matching: Identification of Anomalous Sample Phenotypes with Random Effects," BMC Bioinformatics 15 (1), 314, 2014. Online

  • Murat Dundar, Halid Z. Yerebakan, Bartek Rajwa, "Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples," In Proceedings of the 20th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'14), New York, USA, Aug 24-27 2014. (acceptance rate: 15%) PDF Video Lecture

  • Murat Dundar, Bartek Rajwa, Lin Li, “Partially-observed Models for Classifying Minerals on Mars,” In Proceedings of WHISPERS'13, Gainesville, FL, June 25-28, 2013. PDF

Presentations:

  • Title: Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams
    Presenter: Baichuan Zhang
    Venue: ACM CIKM' 16, Indianapolis, IN
    Presentation Type: Oral

  • Title: Rare Jarosite Detection in CRISM Imagery by Non-Parametric Bayesian Clustering
    Presenter: Murat Dundar
    Venue: WHISPERS' 16, Los Angeles, CA
    Presentation Type: Oral

  • Title: Simplicity of Kmeans versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data
    Presenter: Murat Dundar
    Venue: ICMLA' 15, Miami, FL
    Presentation Type: Oral

  • Title: The Infinite Mixture of Infinite Gaussian Mixtures
    Presenter: Halid Yerebakan
    Venue: NIPS' 14, Montreal, CA
    Presentation Type: Poster

  • Title: A non-parametric Bayesian model for joint cell clustering and cluster matching under random effects
    Presenter: Murat Dundar
    Venue: GLIIFCA'14, Oconomowoc, WI
    Presentation Type: Invited

  • Title: Batch Discovery of Recurring Rare Classes toward Identifying Anomalous Samples
    Presenter: Murat Dundar
    Venue: KDD' 14, New York, NY
    Presentation Type: Oral (Video)

Other Products:

News Releases:

Software:

  • ASPIRE: This is a software implemented in C++ for identifying recurring classes (both normal and rare) across a batch of samples that are significantly perturbed by random effects in a completely unsupervised way.

  • I2GMM: This is a software implemented in C++ for clustering data sets with well-defined albeit skewed/multi-mode clusters. It uses a two-level non-parametric Bayesian hierarchy of Gaussian mixture models.

Invention Disclosures:

  • Methods for Discovering Rare Cell Populations and Anomalous Samples in Flow Cytometry.
    Inventors: Murat Dundar and Bartek Rajwa

In-class Kaggle Contests:

Graduate Students:

  • Halid Ziya Yerebakan, PhD student, 2011-2017
    Thesis Topic: Non-parametric Bayesian Inference using Partially-observed Hierarchical Data Sets

  • Yicheng Cheng, PhD student, 2015-

  • Sarkhan Badirli, PhD student, 2016-

  • Abdulmecit Gungor, MS student, 2016

  • Hossein Karimy, MS student, 2014-2015
    Thesis Topic: Automated Image Classification via Unsupervised Feature Learning by K-means

  • Nathan Hammes, MS Student, Summer 2014
    Special Study: Applied Machine Learning
    Nathan Hammes ranked 3rd in the DecMed2014: Decoding the Human Brain challenge among 267 teams. Congratulations Nathan!

Undergraduate Students:

Spring 2014:

  • Jordyn Kramer
    Preparing wrapper files for running ASPIRE software in R

  • Brandon Upp
    Implementing an XML parser for parsing clinical records collected as part of the The Cancer Genome Atlas (TCGA) initiative

  • Nhan Do
    Modeling stripe noise in hyperspectral images acquired by CRISM (Compact Reconnaissance Imaging Spectrometers for Mars)

Fall 2014:

  • Nhan Do
    Preprocessing PubMed Abstracts for Automated MeSH Indexing

Spring 2015:

  • Kelly De Waal
    Literature Review on Large Scale Medical Informatics

  • Yicheng He
    Unsupervised Feature Learning using Optical Scatters of Bacterial Cultures

  • Nhan Do
    Automated MeSH Indexing for PubMed Abstracts
    IUPUI Center for Research and Learning RISE award has chosen Nhan Do to receive a $1,500 grant to help him complete this research project with Dr. Dundar.

Fall 2015:

  • Yicheng He
    Testing Theano Deep Learning Algorithms on the Diabetic Retinopathy Data set

Fall 2016:

  • Andrew Swineheart
    Identifying Interest Groups using Movie Ratings

Spring 2017:

  • Uladzimir Kasacheuski

  • Venkata Arza

  • Blake Conrad