|
Full Professor, Computer Science
|
[ Research |
[ Publications |
[ Teaching |
[ Software |
Personal |
Contact ]
I am a full Professor at IUPUI CS.
I got my PhD from the Computer Science department at
RPI. Here is a copy of my PhD dissertation,
which won SIGKDD Doctoral Dissertation
Award for the year of 2010. Until August 2010, I was a senior research scientist at eBay Research Labs, San Jose, CA. Earlier,
I obtained an MS degree from the Department of Computer Science
at the University of Minnesota, Twin Cities and a BS degree from
BUET.
Graph Mining is my core research interest. But, I am broadly interested in
data mining, bioinformatics, biomedical informatics, machine learning, information retrieval and
social network analysis. My current research is supported by NSF, NIH, and eBay Inc, San Jose. In 2012,
I received the NSF Career Award.
A list of my research publications are available from DBLP and
Google Scholar.
|
- Representative Pattern Mining by Randomized Algorithm
Lack of scalability and very large output size have inspired the
recent researches in frequent pattern mining (FPM) to shift from
obtaining the complete set of frequent patterns to generating only a
representative (summary) subset of frequent patterns. Most of the
existing approaches to this problem adopt a two-step solution; in
the first step, they obtain all the frequent patterns, and in the
second step, they use some form of clustering to obtain the summary
pattern set. However, the two-step method is inefficient and
sometimes infeasible since the first step itself may fail to finish
in a reasonable amount of time. We propose different algorithms to
obtain representative frequent patterns. Our algorithms are based
on random walk on the frequent pattern partial order. The salient
feature of these algorithms is that they avoid enumerating all frequent
patterns; so they are scalabe to database of large real life graphs.
- Output Space Sampling for Graph Patterns
Mohammad Al Hasan, and Mohammed Zaki.
To appear in Proceedings of the 35th International Conference on Very Large Data Bases
(VLDB-2009), Lyon, France, August 2009.
- MUSK: Uniform Sampling of k Maximal Patterns
Mohammad Al Hasan, and Mohammed Zaki.
To appear in Proceedings of the SIAM International Conference on Data Mining
(SDM-2009), Sparks, April 2009.
-
ORIGAMI: A Novel and Effective Approach for Mining Representative Graph Patterns
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki. in
Statistical Analysis and Data Mining, vol 1(2), 2008, pp. 67-84
-
Summarization in Pattern Mining
Mohammad Al Hasan in
Encyclopedia of Data Warehousing and Mining, (2nd Ed) ,
Information Science Reference, 2008
-
ORIGAMI: Mining Representative Orthogonal Graph Patterns
Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Jeremy Besson, and Mohammed J. Zaki. in
Proceedings of IEEE International Conference on Data Mining, Omaha, NE, 2007, pp 153. (full paper acceptance rate = 7.5%)
- Protein Docking by Shape Complemenrary
We describe an efficient method, named ContextShape for partial complementary shape
matching for use in rigid-body protein-protein docking. ContextShape represents the local shape
feature of a protein by using boolean data structures. It uses precalculated lookup tables
to search for relative orientations of receptor and ligand surfaces efficiently. It derives
the energetic quantities from shape complementarity and buried surface area computations,
using efficient boolean operations. Results indicate that ContextShape outperforms popular
docking algorithm, like ZDock, PatchDock on benchmark datasets for rigid body docking.
We are currently extending this work for flexible docking.
-
Context shapes: Efficient complementary shape matching for protein-protein docking
Zujun Shentu, Mohammad Al Hasan, Chris Bystroff, and Mohammed J. Zaki in
PROTEINS: Structure, Function, and Bioinformatics, vol 70(3), 2008, pp. 1056-1073
- Clustering
In clustering, I have several works that are marginally related. The most recent one is related
to semi-supervised clustering where we find a set of representative objects that cover the rest
of the objects by a lower bound on the similarity value. From the representative objects, a clustering
of the entire dataset can be obtained by assigning each object, x to a cluster represented
by some object, y, where y covers x. The intuitive argument of using similarity
lower bound instead of a fixed k(cluster-count) is that, when the clusters are irregular
(like, embedded manifold in low dimension), a pre-defined number of clusters can not generate
good clustering; the number of clusters should be adapted based on the irregularity of the cluster
boundary.
- Clustering with Lower Bound on Similarity (Best Paper Award)
Mohammad Al Hasan, Saeed Salem, Benjarath Pupakdi and Mohammed J. Zaki in
Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Thailand, 2009
Another recent work in clustering that I took part is Arbitary shape clustering. It has
numerous applications; like, spatial clustering, image segmentation, etc.
In this work, we propose a linear time (with respect to the number of objects) algorithm, named
SPARCL for this task. The method exploits the linear time complexity of the k-Means
algorithm by first finding K>k small clusters. Then, these small clusters are combined
carefully to obtain the final shape clusters.
- SPARCL: An Effective and Efficient
Algorithm for Mining Arbitrary Shape-based Clusters
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki in
Knowedge and Information Systems Journal (accepted)
- SPARCL: Efficient and Effective Shape-based Clustering
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki in
Proceedings of IEEE International Conference on Data Mining, Pisa, Italy, 2008 (full paper acceptance rate = 9.67%)
- Robust partitional clustering by outlier and density inensitive seeding
Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Mohammed J. Zaki.
Pattern Recognition Letters, 30(11), 994-1002
- Generic Pattern Mining
Generic algorithms are efficient software development practice that utilizes concepts and abstractions.
The advantage is that it can replace a large collection of definite programs and is much easier to maintain.
The most successful example of generic software is the C++ Standard Template Library (STL).
The objective of our research along this direction is to develop a generic algorithm for pattern mining
that works for different kinds of patterns, different input format, different back-end database, etc.
Hence, We developed DMTL (Data Mining Template Library),
a generic pattern mining library. It implements four major pattern (set, sequent, tree and graph)
mining algorithms under a unified framework to show that they admit generic programming. It also
allows the user to define and mine his own patterns with very small amount of effort.
- An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem, and Mohammed J. Zaki in
Data Mining and Knowledge Dicsovery Journal, 17(3), 2008, pp. 457-495
- DMTL: A generic Data Mining Template
Library
Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Nagender Parimi, Mohammed Zaki
in Workshop on Library-Centric Software Design(LCSD05) with Object-Oriented Programming, Systems, Languages and Applications,
San Diego, CA, 2005.
- Towards Generic Pattern Mining
Mohammed Zaki, Nilanjana De, Nagender Parimi, Nilanjana De, Feng Gao, Benjarath Phoophakdee, Joe Urban,
Vineet Chaoji, Mohammad Al Hasan, Saeed Salem
in International Conference on Formal Concept Analysis (ICFCA'05), France, 2005.
A shorter version of this paper is published as invited paper in
Pattern Recognition and Machine Intelligence (PReMI '05)
- Patent Ranking (with IBM Almaden Service Research)
I was involved in this work in Summer of 2006 and 2007, while I was an intern in IBM Almaden Research Center.
The objective is to find an algorithm to rank patents from the text. To solve it, we found key phrases from
the patent text and compute two metrics for each key phrase, novelty and impact. The
novelty of a phrase in a patent is defined as its recency by considering the patent-set that contains the patents
having the same class label as the subject patent. Impact denotes the usages of those phrases in the patents that
are issued later than the subject patent. A simple weighting scheme involving these metrics yields a ranking value
to rank a set of patents.
- US patent: No 7881937 Method for Analyzing Patent Claims
- COA: Finding Novel Patents through Text Analysis
Mohammad Al Hasan, W. Scott Spangler, Thomas Griffin, Alfredo Alba. in
Fifteen ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009.
- Assessing Patent Value through Advanced Text Analytics
Mohammad Al Hasan, W. Scott Spangler. in
Eleventh International Conference of Artificial Intelligence and Law, 2007. A longer version is also published as IBM Research,
Technical Report no-RJ10402, 2007.
- Miscellaneous
- Optimal Placement of Stereo Sensors
Mohammad Al Hasan, Krishna K. Ramachandran, and John E. Mitchell in
Optimization Letters, vol 2(1), 2008, pp. 99-111
- Quantification of Spatial Parameters in 3D Cellular Constructs Using Graph Theory
Amanda Waite Lund, Cagatay C. Bilgin, Mohammad Al Hasan, Lindsey M. McKeen, Jan P. Stegemann, Bulent Yener, Mohammed Zaki, George E. Plopper.
in Journal of Biomedicine and Biotechnology, 2009
- Link Prediction using Supervised Learning
Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, Mohammed Zaki.
in SIAM Workshop on Link Analysis, Counterterrorism and Security with SIAM Data Mining Conference, Bethesda, MD,2006
In my leisure time I enjoy spending time with my sons, Junaid and Rezwan
|
Email |
alhasan AT cs DOT iupui DOT edu |
|
|
Phone |
317-274-3862 (work)
|
|
|
Postal |
Department of Computer Science
Indiana University - Purdue University
Indianapolis, IN 46202
USA
|
|
|