Hasan

Mohammad Al Hasan

Full Professor, Computer Science

[ Research   |   [ Publications   |   [ Teaching   |   [ Software   |   Personal   |   Contact ]


I am a full Professor at IUPUI CS. I got my PhD from the Computer Science department at RPI. Here is a copy of my PhD dissertation, which won SIGKDD Doctoral Dissertation Award for the year of 2010. Until August 2010, I was a senior research scientist at eBay Research Labs, San Jose, CA. Earlier, I obtained an MS degree from the Department of Computer Science at the University of Minnesota, Twin Cities and a BS degree from BUET.

Graph Mining is my core research interest. But, I am broadly interested in data mining, bioinformatics, biomedical informatics, machine learning, information retrieval and social network analysis. My current research is supported by NSF, NIH, and eBay Inc, San Jose. In 2012, I received the NSF Career Award.

A list of my research publications are available from DBLP and Google Scholar.

Latest News

Current Doctoral Students

Research

  • Representative Pattern Mining by Randomized Algorithm
      Lack of scalability and very large output size have inspired the recent researches in frequent pattern mining (FPM) to shift from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a two-step solution; in the first step, they obtain all the frequent patterns, and in the second step, they use some form of clustering to obtain the summary pattern set. However, the two-step method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. We propose different algorithms to obtain representative frequent patterns. Our algorithms are based on random walk on the frequent pattern partial order. The salient feature of these algorithms is that they avoid enumerating all frequent patterns; so they are scalabe to database of large real life graphs.

    • Output Space Sampling for Graph Patterns
      Mohammad Al Hasan, and Mohammed Zaki. To appear in Proceedings of the 35th International Conference on Very Large Data Bases (VLDB-2009), Lyon, France, August 2009.
    • MUSK: Uniform Sampling of k Maximal Patterns
      Mohammad Al Hasan, and Mohammed Zaki. To appear in Proceedings of the SIAM International Conference on Data Mining (SDM-2009), Sparks, April 2009.
    • ORIGAMI: A Novel and Effective Approach for Mining Representative Graph Patterns
      Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki. in Statistical Analysis and Data Mining, vol 1(2), 2008, pp. 67-84
    • Summarization in Pattern Mining
      Mohammad Al Hasan in Encyclopedia of Data Warehousing and Mining, (2nd Ed) , Information Science Reference, 2008
    • ORIGAMI: Mining Representative Orthogonal Graph Patterns
      Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Jeremy Besson, and Mohammed J. Zaki. in Proceedings of IEEE International Conference on Data Mining, Omaha, NE, 2007, pp 153. (full paper acceptance rate = 7.5%)

  • Protein Docking by Shape Complemenrary
      We describe an efficient method, named ContextShape for partial complementary shape matching for use in rigid-body protein-protein docking. ContextShape represents the local shape feature of a protein by using boolean data structures. It uses precalculated lookup tables to search for relative orientations of receptor and ligand surfaces efficiently. It derives the energetic quantities from shape complementarity and buried surface area computations, using efficient boolean operations. Results indicate that ContextShape outperforms popular docking algorithm, like ZDock, PatchDock on benchmark datasets for rigid body docking. We are currently extending this work for flexible docking.

    • Context shapes: Efficient complementary shape matching for protein-protein docking
      Zujun Shentu, Mohammad Al Hasan, Chris Bystroff, and Mohammed J. Zaki in PROTEINS: Structure, Function, and Bioinformatics, vol 70(3), 2008, pp. 1056-1073

  • Clustering
      In clustering, I have several works that are marginally related. The most recent one is related to semi-supervised clustering where we find a set of representative objects that cover the rest of the objects by a lower bound on the similarity value. From the representative objects, a clustering of the entire dataset can be obtained by assigning each object, x to a cluster represented by some object, y, where y covers x. The intuitive argument of using similarity lower bound instead of a fixed k(cluster-count) is that, when the clusters are irregular (like, embedded manifold in low dimension), a pre-defined number of clusters can not generate good clustering; the number of clusters should be adapted based on the irregularity of the cluster boundary.

      • Clustering with Lower Bound on Similarity (Best Paper Award)
        Mohammad Al Hasan, Saeed Salem, Benjarath Pupakdi and Mohammed J. Zaki in Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Thailand, 2009

      Another recent work in clustering that I took part is Arbitary shape clustering. It has numerous applications; like, spatial clustering, image segmentation, etc. In this work, we propose a linear time (with respect to the number of objects) algorithm, named SPARCL for this task. The method exploits the linear time complexity of the k-Means algorithm by first finding K>k small clusters. Then, these small clusters are combined carefully to obtain the final shape clusters.

  • Generic Pattern Mining
      Generic algorithms are efficient software development practice that utilizes concepts and abstractions. The advantage is that it can replace a large collection of definite programs and is much easier to maintain. The most successful example of generic software is the C++ Standard Template Library (STL). The objective of our research along this direction is to develop a generic algorithm for pattern mining that works for different kinds of patterns, different input format, different back-end database, etc. Hence, We developed DMTL (Data Mining Template Library), a generic pattern mining library. It implements four major pattern (set, sequent, tree and graph) mining algorithms under a unified framework to show that they admit generic programming. It also allows the user to define and mine his own patterns with very small amount of effort.

    • An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library
      Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki in Data Mining and Knowledge Dicsovery Journal, 17(3), 2008, pp. 457-495
    • DMTL: A generic Data Mining Template Library
      Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Nagender Parimi, Mohammed Zaki in Workshop on Library-Centric Software Design(LCSD05) with Object-Oriented Programming, Systems, Languages and Applications, San Diego, CA, 2005.
    • Towards Generic Pattern Mining
      Mohammed Zaki, Nilanjana De, Nagender Parimi, Nilanjana De, Feng Gao, Benjarath Phoophakdee, Joe Urban, Vineet Chaoji, Mohammad Al Hasan, Saeed Salem in International Conference on Formal Concept Analysis (ICFCA'05), France, 2005. A shorter version of this paper is published as invited paper in Pattern Recognition and Machine Intelligence (PReMI '05)

  • Patent Ranking (with IBM Almaden Service Research)
      I was involved in this work in Summer of 2006 and 2007, while I was an intern in IBM Almaden Research Center. The objective is to find an algorithm to rank patents from the text. To solve it, we found key phrases from the patent text and compute two metrics for each key phrase, novelty and impact. The novelty of a phrase in a patent is defined as its recency by considering the patent-set that contains the patents having the same class label as the subject patent. Impact denotes the usages of those phrases in the patents that are issued later than the subject patent. A simple weighting scheme involving these metrics yields a ranking value to rank a set of patents.

    • US patent: No 7881937 Method for Analyzing Patent Claims
    • COA: Finding Novel Patents through Text Analysis
      Mohammad Al Hasan, W. Scott Spangler, Thomas Griffin, Alfredo Alba. in Fifteen ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.
    • Assessing Patent Value through Advanced Text Analytics
      Mohammad Al Hasan, W. Scott Spangler. in Eleventh International Conference of Artificial Intelligence and Law, 2007. A longer version is also published as IBM Research, Technical Report no-RJ10402, 2007.

  • Miscellaneous

Personal
In my leisure time I enjoy spending time with my sons, Junaid and Rezwan
Contact Info

Email alhasan AT cs DOT iupui DOT edu
   
Phone 317-274-3862 (work)
   
Postal Department of Computer Science
Indiana University - Purdue University
Indianapolis, IN 46202
USA