photo                                                    
Research
Publications
Teaching
Awards

Services
Yuni Xia is an Associate Professor of the Computer and Information Science Department at Indiana University - Purdue University Indianapolis (IUPUI). She received PhD and MS in Computer Science from Purdue University, and B.S. in Computer Science from Central China (HuaZhong) University of Science and Technology. She had worked as an intern at IBM T.J. Watson Research center before joining IUPUI.  Xia's research is on data mining and databases, focusing on mining and management of big data and data streams including biomedical data, sensor data and moving object data. She also works on managing uncertainty in the decision support process.

Office Hour, Spring 2015: Monday and Wednesday, 1-3pm


Contact Info
Address:  723 W. Michigan St, SL280E, Indianapolis, IN 46202
Phone:     (317) 274-9738
Fax:         (317) 274-9742


Research

        Data Mining: Big Data, Uncertain Data Mining, Data Stream Mining, Biomedical Informatics
        Databases: Data Streams, Data Uncertainty Management

    Research Projects (we gratefully acknowledge the support of funding agencies):
Health-Terrain: Visualizing Large Scale Health Data, Supported by US Department of the Army, Co-PI (PI: Fang) : This project aims to design a framework and new techniques for mining and visualizing large scale health data. The Notifiable Condition Detector (NCD) system is an automated electronic lab reporting (ELR) and case-notification system which has been used in Indiana for over ten years to report laboratory results for the detection of notifiable conditions such as novel H1N1 influenza, sexually transmitted diseases, lead poisoning, and salmonella [1]. In this project, we analyze and visualize dimensions of the NCD data. We identify most common conditions, find the distribution of the diseases across gender and race, study the co-occurrence of diseases and investigate the associations among different diseases.

Development of Key Technologies for Big Data Analysis and Management Software Based on Next Generation Memory,  Supported by ETRI,  Co-PI (Institute PI: Lee) This collaborative project seeks to develop big-data main memory database management system and distributed streaming processing system using hardware acceleration techniques including FPGA ( programmable / reconfigurable chips) and GPGPU(general purpose graphics processing unit).      

Large Scale Sensor Stream Analysis and Mining for Geriatric Care, Supported by IBM
  This project aims to design and develop a real-time distributed sensor stream monitoring and analysis system for geriatric care. This enables effective home-based continuous geriatrics care, which is not only cost-savings, but also improves the quality of life of the elderly and their families.


DisProt Database: A Central Repository of Information on Intrinsically Disordered Proteins, Supported by NSF,  Co-PI (PI: Dunker):
The goal of this project is to fully develop DisProt, a database that provides an essential depository of information about intrinsically disordered proteins (IDPs) . DisProt will be not only a collection of data on intrinsically disordered proteins and their functions, but also a unique research tool to conduct various computational studies on these proteins and to help design better research strategies for studying individual IDPs in laboratory. It's expected that DisProt will support a very wide-spread use, both for the purpose of carrying out bioinformatics experiments and for the entire community involved in understanding cell and molecular biology.

TrafficAnalyzer: A Real-time Traffic Stream Processing and Analyzing System, Supported by IBM:
   Modern traffic monitoring systems are required to perform real-time processing and analysis of peta-bit continuous data streams. In this project, we propose to design and develop a real-time traffic stream processing and analyzing system.  The most important feature of TrafficAnalyzer is the real-time performance. The results of processing need to be produced with virtually zero latency, because in traffic monitoring system, real-time response is crucial for reducing accident rate and smoothing traffic flow. TrafficAnalyzer must support sophisticated time-windowed processing operations since streaming data continually changes, often at high rates. These operations should be executed in a way that produces results incrementally as new data arrives, since the entire data set is never available in its entirety.  TrafficAnalyzer also provides careful management of the historical data, as it need compare and combine present data with the past to study the traffic flow change over the time. TrafficAnalyzer is also resilient to inaccuracy and uncertainties in the data streams, because inherent variations, losses, or reordering of the data streams cause data to arrive in the wrong order, or with variable delays.


Development of SYMBIOTE; A Reconfigurable Logic Assisted Data Stream Management System for Multimedia Sensor Networks, Supported by NSF, Co-PI (PI: Lee)
: Numerous emerging applications require real-time processing of high bandwidth multimedia data streams. In this project, we propose a novel class of data stream management systems called Reconfigurable Logic Assisted DSMS (RLADSMS) that will provide one of the first comprehensive and demonstrative approaches to using Reconfigurable Logic coprocessors as data stream accelerators in the prototype RLADSMS called SYMBIOTE. This project will investigate key issues such as data models, query languages, hardware DSMS operators, corresponding cost models of query execution, considering hardware complexity of database operators, run-time complexity of hardware and software operators, interconnect latencies, bandwidth, resource allocation as well as optimization techniques for this new class of data stream management system.

Invention of a Consumer-Side Geriatric Health Care Knowledge Management and Decision Support System, Supported by 21st Century Research and Development Fund,  State of Indiana, Co-PI (Institute PI: Palakal):  This project proposes to build an innovative Knowledge Management system unique in the Geriatric Care Management Industry. This system will accelerate the adoption of standards of care and provide the accumulation of knowledge from current Social Science, Psychology, and Health disciplines. It will also build a basis, comparable to the Health Care Industry model, for evidence based outcomes validation.



Publications
  • Mathew Palakal, Shiaofen Fang, Yuni Xia, Anand Krishnan, Sam Bloomquist, Thanh Nguyen, Roland Gamache and Shaun Grannis, Detecting Comorbidity of Chlamydia from Clinical Reports for Health Terrain Visualization, 2013 Workshop on Visual Analytics in Healthcare, In conjunction with AMIA 2013.
  • Jeremy Keiper, Yuni Xia, Shiaofen Fang, Mathew Palakal, Shaun Grannis, Roland Gamache, Thanh Minh Nguyen, Sam Bloomquist and Anand Krishnan, Use Cases for Public Health Data Visualization, 2013 Workshop on Visual Analytics in Healthcare, In conjunction with AMIA 2013.
  • Yuni Xia, Shiaofen Fang, Mathew Palakal, Roland Gamache Jr, Thanh Minh Nguyen, Sam Bloomquist, Anand Krishnan, Jeremy Keiper, Shaun Grannis, Data Exploration of a Notifiable Condition Detector System, 2013 Workshop on Visual Analytics in Healthcare, In conjunction with AMIA 2013.
  • Biao Qin, Yuni Xia, Fang Li and Jiaqi Ge, EMU: An expectation maximization based approach for clustering uncertain data, Journal of Intelligent & Fuzzy Systems, 1067- 1083,  2013.
  • Chandima Hewa Nadungodage, Yuni Xia, John Lee, Myungcheol Lee, Choon Seo Park, GPU Accelerated Item-Based Collaborative Filtering for Big-Data Applications,  proceedings of the IEEE International Conference on Big Data (IEEE BigData) 2013.
  • Brian E. Dixon, Marc B. Rosenman, Yuni Xia, Shaun J. Grannis,  A Vision for the Systematic Monitoring and Improvement of the Quality of Electronic Health Data. MedInfo 2013: 884-888.
  • Chandima Hewa Nadungodage, Jaehwan John Lee, Yuni Xia, Miyoung Lee, Myungcheol Lee, GPU-based Memory Efficient Recommendation System for Big Data Applications, Poster, the International Conference on GPU technology, 2013.
  • Chandima Hewa Nadungodage, Yuni Xia, Jaehwan John Lee, Yi-cheng Tu, Hyper-Structure Mining of Frequent Patterns in Uncertain Data Streams, Journal of Knowledge and Information Systems ( KAIS) , 2013.
  • Shaun Grannis, Brian Dixon, Yuni Xia, Jianmin Wu, Using Information Entropy to Monitor Chief Complaint Characteristics and Quality,  International Society for Disease Surveillance Conference, 2012.
  • Chandima H. Nadungodage, Yuni Xia, Pranav S. Vaidya, Yu Chen, Jaehwan Lee, Online Multidimensional Regression Analysis on Concept-drifting Data Streams, International Journal of Data Mining, Modeling and Management (IJDMMM),  Accepted.
  • Biao Qin, Yuni Xia, Shan Wang, Xiaoyong Du, A Novel Bayesian Classification Method for Uncertain Data, Journal of Knowledge-Based Systems, Volume 24, Issue 8, 1151-1158, 2011.
  • Omkar Tilak, Andrew Hoblitzell, Snehasis Mukhopadhyay, Qian You, Shiaofen Fang, Yuni Xia, Joseph Bidwell, Multi-Level Text Mining for Bone Biology, Concurrency and Computation: Practice and Experience, 23(17): 2355-2364, 2011
  • Yu Chen, Pranav Vaidya, Jaehwan John Lee, Chandima Hewa Nadungodage, Yuni Xia, Renfa Li, Qiang Wu, A New Hardware/Software Partitioning Methodology Combining Search Space Smoothing and Discrete Particle Swarm Optimization, , International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), 2011.
  • Chandima Hewa Nadungodage, Yuni Xia, Fang Li, Jaehwan John Lee, Jiaqi Ge, StreamFitter: A Real Time Linear Regression Analysis System for Continuous Data Streams, Demo, International Conference on Database Systems for Advanced Applications (DASFAA) 2011.
  • Biao Qin, Yuni Xia, Rakesh Sathyesh, Jiaqi Ge, Sunil Probhakar, Classify Uncertain Data with Decision Tree, Demo, International Conference on Database Systems for Advanced Applications (DASFAA) 2011. 
  • Sandeep Raghuram, Yuni Xia, Jiaqi Ge, Mathew Palakal, Josette Jones, Dave Pecenka, Eric Tinsley, Jean Bandos, and Jerry Geesaman. AutoBayesian: Developing Bayesian Networks Based on Text Mining, Demo, International Conference on Database Systems for Advanced Applications (DASFAA) 2011.  (Best Demo Award)
  • Biao Qin, Yuni Xia, Sunil Prabhakar, Rule Induction for Uncertain Data, Knowledge and Information System(KAIS), 23(17): 2355-2364, 2011
  • Shaoping Chen , Yi-Cheng Tu , Yuni Xia,  Performance Analysis of a Dual-tree Algorithm for Computing Spatial Distance Histograms, VLDB Journal, 20(4): 471-494, 2011
  • Pranav Vaidya, Y. Chen, Jaehwan John Lee, Chandima Hewa Nadungodage, and Yuni Xia, A General Purpose FPGA Data Filter For Data Stream Processing, International Conference on Engineering of Reconfigurable Systems and Algorithms (ERSA), pp. 247-250, 2010.
  • Jiaqi Ge, Yuni Xia, Yicheng Tu, A Discretization Algorithm for Uncertain Data, the 21st International Conference on Database and Expert Systems Applications (DEXA), 2010. (Acceptance Rate: 22.7%)
  • Andrew Hoblitzell, Snehasis Mukhopadhyay, Qian You, Shiaofen Fang, Yuni Xia, Joseph Bidwell, Text Mining for Bone Biology, Proceeding of the Workshop on Emerging Computational Methods for the Life Sciences, 2010.
  • Pranav S. Vaidya, Jaehwan John Lee, Francis Bowen, Yingzi Du, Chadima H. Nadungodage, Yuni Xia, Symbiote - A Reconfigurable Logic Assisted Data Stream Management System (RLADSMS),  Demo, the ACM Conference on Management of Data (SIGMOD), 2010.
  • Biao Qin, Yuni Xia, Fang Li. A Bayesian Classifier for Uncertain Data,  the 25th ACM Symposium on Applied Computing (SAC), 2010.  (Acceptance Rate: 25%)
  • Jiaqi Ge, Yuni Xia, Chandima Nadungodage, Classify Uncertain Data with Neural Network,  the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2010. (Acceptance Rate: 10.2%)
  • Biao Qin, Yuni Xia, Rakesh Sathyesh, Sunil Prabhakar, Yicheng Tu, uRule: A Rule Based Classifier for Data with Uncertainty, Demo, the IEEE International Conference on Data Mining (ICDM), 2009.
  • Sandeep Raghuram, Yuni Xia, Mathew Palakal, Josette Jones, Dave Pecenka, Eric Tinsley, Jean Bandos, and Jerry Geesaman. Bridging Text Mining and Bayesian Networks, Proc. of the Workshop on Intelligent Biomedical Information Systems (IBIS), 2009.
  • Biao Qin, Yuni Xia, Fang Li, DTU: A Decision Tree for Classifying Uncertain Data, the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD),  2009 (Acceptance Rate: 11.5%).
  • Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, A Rule-Based Classification Algorithm for Uncertain Data, the IEEE workshop on Management and Mining of Uncertain Data(MOUND), in conjunction with International Conference of Data Engineering, 2009.
  • Jiangang Liu, Andrew Campen, Shuguang Huang, Sheng-Bin Peng, Xiang Ye, Mathew Palakal, A. Keith Dunker, Yuni Xia and Shuyu Li, Identification of a gene signature in cell cycle pathway for breast cancer prognosis using gene expression profiling data, BMC Medical Genomics, 2008, 1:39 .
  • Yuni Xia, Sunil Prabhakar, Shan Lei, Reynold Cheng and Rahul Shah, Indexing Continuously Changing Data with Mean Variance Tree, International Journal of High Performance Computing and Networking, Vol. 5, No. 4, pages 263-272, 2008.
  • Biao Qin, Yuni Xia,  Generating Efficient Safe Query Plans for Probabilistic Databases, Journal of Data and Knowledge Engineering (DKE), Volume 67, Issue3, Pages 485-503, 2008.
  • Andrew Campen, Yuni Xia, Dan Rigsby, Ying Guo, Xingdong Feng, Eric Su, Mathew Palakal and Shuyu Li, Mining Gene Expression Database for Primary Human Disease Tissues, Demo, the IEEE 24th International Conference on Date Engineering(ICDE), 2008.
  • Yuni Xia, Bowei Xi, Conceptual Clustering Categorical Data with Uncertainty, the IEEE 19th International Conference on Tools with Artificial Intelligence (ICTAI), Patras, Greece, 2007. (Acceptance Rate: 28%)
  • Yuni Xia, Andrew Campen, Dan Rigsby, Ying Guo, Xingdong Feng, Eric Su, Mathew Palakal, Shuyu Li, DGEM - a Microarray Gene Expression Database for Primary Human Disease Tissues, Molecular Diagnosis and Therapy, Issue 3, 2007.
  • Yuni Xia, Yicheng Tu, Mikhail Atallah, Sunil Prabhakar, Reducing Data Redundancy in Location-based Services, the International Conference on Geosensor Networks (GeoSensor), pp. 30-35, Boston, USA, 2006.
  • Reynold Cheng, Sarvjeet Singh, Sunil Prabhakar, Rahul Shah, Jeffrey Scott Vitter, Yuni Xia, Efficient Join Processing over Uncertain Data, the ACM 15th Conference on Information and Knowledge Management (CIKM), pp. 738-747, Arlington, USA, 2006. (Acceptance Rate: 15%)
  • Yicheng Tu, Mohamed Hefeeda, Yuni Xia, Sunil Prabhakar, Song Liu, Control-Based Quality Adaptation in Data Stream Management Systems, the International Conference of Database and Expert Systems Applications (DEXA), pp.746 - 755, Copenhagen, Denmark, 2005. (Acceptance Rate: 23%)
  • Yuni Xia, Sunil Prabhakar, Shan Lei, Reynold Cheng, Rahul Shah, Indexing Continuously Changing Data with Mean Variance Tree, the 20th ACM Symposium on Applied Computing (SAC), pp. 1125 - 1132, Santa Fe, New Mexico, USA, 2005. (Acceptance Rate: 30%)
  • Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah, Change Tolerant Indexing for Constantly Evolving Data, the International Conference on Data Engineering (ICDE), pp. 391-402, Tokoyo, Japan, 2005. (Acceptance Rate: 13%)
  • Yuni Xia, Sunil Prabhakar, Jiangzhong Sun, Shan Lei, Indexing and Query Constantly Evolving Data Using Time Series Analysis, the 10th International Conference on Database Systems for Advanced Applications (DASFAA), pp.637-648, Beijing, China 2005. (Acceptance Rate: 22%)
  • Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah, Jeffery Scott Vitter, Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data, the 30th International Conference of Very Large Database (VLDB), pp.876 - 887, Toronto, Canada, 2004. (Acceptance Rate: 16%)
  • Yuni Xia, Sunil Prabhakar, Efficient VNG Indexing in Location-aware Services, the International Workshop on Mobile and Distributed Computing (MDC), pp.414 - 419, Providence, Rhode Island, USA, 2003.
  • Yuni Xia, Sunil Prabhakar, Q+Rtree: Efficient Indexing for Moving Object Databases, the 8th International Conference on Database Systems for Advanced Applications (DASFAA), pp.175 - 182, Kyoto, Japan, 2003. (Acceptance Rate: 25%)
  • Sunil Prabhakar, Yuni Xia, Dmitri Kalashnikov, Walid Aref, Susanne Hambrusch, Query Indexing and Velocity Constrained Indexing: Scalable Techniques for Continuous Queries on Moving Objects, IEEE Transactions on Computers, Vol.51, No.10, pp.1124 - 1140, 2002.

Book Chapters

  • Yuni Xia, Jonathon Munson, David Wood, Alan Cole, Location-based Service System (LBS) Analysis and Design,  Handbook of Research on Modern Systems Analysis and Design Technologies and Applications, ISBN: 978-1-59904-887-1; 698 pp, 2008.
  • Meeta Pradhan and Yuni Xia, Bioterrorism and Biosecurity , Handbook of Research on Information Security and Assurance, ISBN: 978-1-59904-855-0, 586 pp, 2008.
  • Sunil Prabhakar, Dmitri V. Kalashnikov, and Yuni Xia, Query Indexing and Velocity Constrained Indexing, Encyclopedia of GIS, Springer Science, 2008.


Teaching

Please log into Oncourse for lecture notes, readings, assignments, projects, etc.
CSCI340: Discrete Computational Structures
CSCI441: Client Server Databases
CSCI443: Database Systems
CSCI481: Data Mining
CSCI541: Database Management Systems
CSCI573: Data Mining
CSCI590: Advanced Database Systems


Awards

         Best Demo Award, International Conference on Database Systems for Advanced Applications (DASFAA), 2011
         Scalable Data Analytics Innovation Award, IBM Research, 2010
         Techpoint Mira Award, with Senior Care Navigation System development team at My Health Care Manager LLC, 2010
         Trustees Teaching Award, IUPUI, 2009
         Research Venture Award, IUPUI, 2009
         Real Time Innovation Award, IBM Research, 2008
         TechPoint MIRA Award, with Purdue University Knowledge Projection Team, 2005
         Leading Light Award, Indiana TechPoint Organization, 2004
         IBM Grace Hopper /Anita Borg Scholarship, 2004


Profession Services
Program Committee:
        The International Conference on Collaborative Computing (CollaborateCom), 2010 to  present
        The IEEE International Conference on Computer and Information Technology (CIT)
        The International Conference on Frontier Computing (FC), 2010
        The International Workshop on Smart Homes for Tele-Health (SmarTel), 2010
        The IEEE 12th International Conference on Computational Science and Engineering(CSE), 2009
        The International Workshop on Smart Homes for Tele-Health (SmarTel), 2009
        The International Workshop on Information Fusion and Dissemination in Wireless Sensor Networks (SensorFusion), 2009
        The International Conference on Intelligent Pervasive Computing (IPC), 2008
        The IEEE 11th International Conference on Computational Science and Engineering(CSE), 2008
        The IEEE 21st International Conference on Advanced Information Networking and Applications (AINA), 2007
        The IEEE/ACS 5th International Conference on Computer Systems and Applications (AICCSA), 2007
        The Third International Conference on Intelligent Environments(IE), 2007
        The International Workshop on Information Fusion and Dissemination in Wireless Sensor Networks(SensorFusion), 2007
        The International Workshop on Knowledge Management and Discovery for Ubiquitous and Pervasive Applications (KUPA), 2007

Local Chair:
       ACM SIGMOD/PODS Conference, 2010

Journal Review:
        IEEE Transaction on Knowledge and Data Engineering
        IEEE Transactions on Parallel and Distributed Systems
        ACM Transaction on Database System
        ACM Transaction on Knowledge Discovery from Data
        Knowledge and Information Systems
        Data and Knowledge Engineering
        Information Systems
        Information Sciences
        The International Journal of Telemedicine and Application
        The Information Fusion Journal
        The Journal of System and Software
        The International Journal of Data Mining and Bioinformatics
        The Electronics and Telecommunication Research Institute Journal
        The International Journal of Computer Science and Technology
        The Journal of Ubiquitous Computing and Intelligence

Panelist:
        Panelist, National Science Foundation, CISE, 2007, 2009, 2011

Program Co-Chair:
        The ACM workshop on Health Information and Knowledge Management (HIKM) 2006