CAREER: Self-adjusting Models as a New Direction in Machine Learning
Award number: 1252648
Project
period: 3/1/2013 - 2/28/2018
Abstract:
Machine learning algorithms are now
routinely used to build predictive models from data in
wide range of applications. However, current approaches
to machine learning have an important limitation: They
assume that the set of classes observed in a training
data set is exhaustive and that new data samples
originate from one of the existing classes represented
in the training data set. This assumption is unrealistic
in many real-world applications in which previously
unobserved classes of interest emerge.
This study explores a new class of machine learning
algorithms that produce self-adjusting models that can
accommodate new classes observed in data in offline as
well as online learning scenarios. The project aims to (i)
use non-parametric models to dynamically incorporate the
changing number of classes; (ii) develop new online and
offline inference techniques to accommodate new classes
as they emerge (iii) automatically associate newly
discovered classes with higher-level groups of classes
in an attempt to identify potentially interesting class
formations, and (iv) develop partially-observed tree
models containing observed and unobserved nodes, where
observed nodes represent existing classes and unobserved
nodes are introduced online to fill the gaps in the
existing data hierarchy that become evident only with
the arrival of new data.
The broader impacts of this work could extend several
real world applications: Bio-security and
bio-surveillance, information retrieval, and remote
sensing among others in settings where all of the
classes are not known a priori. The educational plan
includes outreach to K-12 students and enhanced research
opportunities for undergraduate and graduate students in
computer science as well as at the intersection of
computational and life sciences. All the software,
publications, and data sets resulting from the project
will be freely disseminated to the larger research and
educational community.
Publications:
-
Sarkhan Badirli, Zeynep Akata, Murat Dundar, “Bayesian Zero-shot Learning” under review.
-
Baichuan Zhang, Murat Dundar, Vachik Dave, Muhammad Al Hasan, “Dirichlet Process Gaussian Mixture for Active Online Name Disambiguation by Particle Filter,” in Proceedings of Joint Conference on Digital Library, 2019.
PDF
-
Ellen Leask, Bethany Ehlmann, Murat Dundar, “Investigating Hydrated Mineral Deposits in Tera Sirenum Mars,” Lunar and Planetary Science Conference 50, 2019.
-
Murat Dundar, Bethany Ehlmann, Ellen Leask, “Rare Phase Detections in CRISM Data at Pixel-scale by Machine Learning Generate New Discoveries about Geology at Mars Rover Landing Areas: Jezero and NE Syrtis,” Lunar and Planetary Science Conference 50, 2019.
-
Yicheng Cheng, Bartek Rajwa, Murat Dundar, "Bayesian Nonparametrics for Non-exhaustive Learning," Advances in Neural Information Processing Systems (NIPS), Bayesian Nonparametrics Workshop, 2018.
PDF
-
Ellen Leask, Bethany Ehlmann, Murat Dundar, Scott Murchie, Frank Seelos, “Challenges in the Search for Perchlorate
and Other Hydrated Minerals with 2.1μm Absorptions on Mars,” Geophysical Research Letters, 45(22), 2018.
PDF
-
Ellen Leask, Bethany Ehlmann, Murat Dundar, Scott Murchie, Frank Seelos, “New Possible CRISM Artifact at 2.1 Micrometers and Implications for Orbital Mineral Detections,” Lunar and Planetary Science Conference 49, 2018.
-
Yicheng Cheng*, Murat Dundar, George Mohler, “A Coupled ETAS-I2GMM Point Process with Applications to Seismic Fault Detection,” Annals of Applied Statistics,12(3), pp. 1853-1870, 2018.
-
Halid Z. Yerebakan* and Murat Dundar, “Partially Collapsed Parallel Gibbs Sampler for Dirichlet Process Mixture Models,” Pattern Recognition Letters, 90, pp.22-27, 2017.
-
Bethany Ehlmann and Murat Dundar, "Acidic Conditions During Open System Weathering on Late Noachian/Early Hesperian Mars? Newly Identified Outcrops of Alunite and Jarosite from Orbital CRISM Data,", AAS/Division for Planetary Sciences Meeting Abstracts,
2016.
-
Baichuan Zhang, Murat Dundar, Muhammed Hasan, "Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams," in Proceedings of ACM CIKM, Indianapolis, US, Oct 2016.
PDF
Murat Dundar and Bethany Ehlmann, "Rare Jarosite Detection in CRISM Imagery by Non-Parametric Bayesian Clustering," in Proceedings of IEEE WHISPERS'16, Los Angeles, US, Aug 2016.
PDF
-
Bartek Rajwa, Paul Wallace, Elizabeth Griffiths, Murat Dundar,
"Automated Assessment of Disease Progression in Acute Myeloid Leukemia by Probabilistic Analysis of Flow Cytometry
Data," IEEE Transactions on Biomedical Engineering,
64(5),
2017.
Online
-
Murat Dundar, Qiang Kou, Baichuan Zhang, Yicheng He,
and Bartek Rajwa, “Simplicity of Kmeans versus Deepness of Deep Learning: A Case of Unsupervised Feature Learning with Limited Data,”
In Proceedings of IEEE International Conference on Machine Learning Applications,
Miami, FL, USA, December 11-13, 2015.
PDF
-
Bethany Ehlmann and Murat Dundar,
"Are Noachian/Hesperian Acidic Waters Key to Generating Mars' Regional Scale Aluminum Phyllosilicates? The Importance of Jarosite Co-occurrences with Al-Phyllosilicate Units,"
46th Lunar and Planetary Science Conference, The
Woodlands, TX, March 16-20, 2015 (oral presentation).
PDF
-
Halid Z. Yerebakan, Bartek Rajwa, Murat Dundar, "The
Infinite Mixture of Infinite Gaussian Mixtures,"
Advances in Neural Information Processing Systems
(NIPS'14), Montreal, Canada, December 8-13, 2014. (acceptance rate: 24.6%)
PDF
-
Murat Dundar, Ferit Akova, Halid Z. Yerebakan, Bartek
Rajwa, "A Non-parametric Bayesian Model for Joint Cell
Clustering and Cluster Matching: Identification of
Anomalous Sample Phenotypes with Random Effects," BMC
Bioinformatics 15 (1), 314, 2014.
Online
-
Murat Dundar, Halid Z. Yerebakan, Bartek Rajwa,
"Batch Discovery of Recurring Rare Classes toward
Identifying Anomalous Samples," In Proceedings of the 20th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD'14), New York, USA, Aug 24-27 2014. (acceptance rate: 15%)
PDF
Video Lecture
-
Murat Dundar, Bartek Rajwa, Lin Li, “Partially-observed Models for Classifying Minerals on Mars,”
In Proceedings of WHISPERS'13, Gainesville, FL, June
25-28, 2013. PDF
Presentations:
-
Title: Bayesian Nonparametrics
for Non-exhaustive Learning
Presenter: Murat Dundar
Venue: Lunar and Planetary Science Conference 50,
The Woodlands, TX
Presentation Type: Oral
-
Title: Rare Phase Detections in CRISM Data at Pixel-scale by Machine Learning Generate New Discoveries about Geology at Mars Rover Landing Areas: Jezero and NE Syrtis
Presenter: Murat Dundar
Venue: NIPS' 18 Bayesian Nonparametrics Workshop, Montreal, CA
Presentation Type: Poster
-
Title: Bayesian Non-Exhaustive
Classification A Case Study: Online Name
Disambiguation using Temporal Record Streams
Presenter: Baichuan Zhang
Venue: ACM CIKM' 16, Indianapolis, IN
Presentation Type: Oral
-
Title: Rare Jarosite Detection in CRISM
Imagery by Non-Parametric Bayesian Clustering
Presenter: Murat Dundar
Venue: WHISPERS' 16, Los Angeles, CA
Presentation Type: Oral
-
Title: Simplicity of Kmeans
versus Deepness of Deep Learning: A Case of
Unsupervised Feature Learning with Limited Data
Presenter: Murat Dundar
Venue: ICMLA' 15, Miami, FL
Presentation Type: Oral
-
Title: The Infinite Mixture of
Infinite Gaussian Mixtures
Presenter: Halid Yerebakan
Venue: NIPS' 14, Montreal, CA
Presentation Type: Poster
-
Title: A non-parametric Bayesian model for joint cell clustering and cluster matching under random effects
Presenter: Murat Dundar
Venue: GLIIFCA'14, Oconomowoc, WI
Presentation Type: Invited
-
Title: Batch Discovery of
Recurring Rare Classes toward Identifying Anomalous
Samples Presenter: Murat Dundar Venue: KDD' 14, New York, NY Presentation Type: Oral (Video)
Other
Products:
News
Releases:
Software:
-
ASPIRE:
This is a software implemented in C++ for identifying
recurring classes (both normal and rare) across a
batch of samples that are significantly perturbed by
random effects in a completely unsupervised way.
-
I2GMM:
This is a software implemented in C++ for clustering
data sets with well-defined albeit skewed/multi-mode
clusters. It uses a two-level non-parametric
Bayesian hierarchy of Gaussian mixture models.
Invention Disclosures:
In-class Kaggle Contests:
-
CSCI 590 Machine Learning 2014.
Automated Indexing of PubMed Abstracts using MeSH
Terms
-
CSCI 590 Machine Learning 2015.
Bacteria Classification at the Genus Level
-
CSCI 590 Machine Learning 2016.
MARS Mineral Discovery Challenge
-
CSCI 590 Machine Learning 2017.
Authorship Attribution Challenge
-
CSCI 590 Machine Learning 2018.
MARS Mineral Discovery Challenge 2
-
CSCI 573 Statistical Machine Learning 2019.
Automated Indexing of PubMed Abstracts using MeSH
Terms 2
Graduate Students:
-
Halid
Ziya Yerebakan, PhD student,
2011-2017
Thesis Topic: Non-parametric
Bayesian Inference using
Partially-observed
Hierarchical Data Sets
-
Yicheng
Cheng, PhD student, 2015-
-
Sarkhan
Badirli, PhD student, 2016-
-
Abdulmecit Gungor, MS
student, 2016 - 2018
Abdulmecit and Sarkhan were part of the team that took 2nd place in the Roche Global Code4Life University Challenge
-
Hossein
Karimy, MS student, 2014-2015 Thesis Topic: Automated Image Classification via Unsupervised Feature Learning by K-means
-
Nathan
Hammes, MS Student, Summer
2014 Special Study: Applied
Machine Learning Nathan Hammes ranked 3rd in
the
DecMed2014: Decoding the
Human Brain challenge
among 267 teams.
Congratulations Nathan!
Undergraduate Students:
Spring 2014:
-
Jordyn Kramer Preparing wrapper files for running ASPIRE software
in R
-
Brandon Upp Implementing an XML parser for parsing clinical
records collected as part of the The Cancer Genome
Atlas (TCGA) initiative
-
Nhan Do Modeling stripe noise in hyperspectral images
acquired by CRISM (Compact Reconnaissance Imaging
Spectrometers for Mars)
Fall 2014:
Spring 2015:
-
Kelly De Waal Literature Review on Large Scale Medical Informatics
-
Yicheng He Unsupervised Feature Learning using Optical Scatters
of Bacterial Cultures
-
Nhan Do Automated MeSH Indexing for PubMed Abstracts
IUPUI Center for Research and
Learning RISE award has chosen Nhan Do to receive a
$1,500 grant to help him complete this research
project with Dr. Dundar.
Fall 2015:
Fall 2016:
Spring 2017:
Fall 2017:
Spring 2018:
Spring 2019:
|