HPDM 2008



Workshop History

After several editions linked with the International Parallel & Distributed Processing Sym-posium (1998-2001), and by the SIAM Data Mining conference (2002-2006), this year HPDM is organized in conjunction with the IEEE International Conference on Data Mining.
The upcoming theme of the workshop is support and ex-ploitation of emerging computer architectures such as multi-core CPUs and streaming GPUs. We are looking forward to a very special and exciting 10th edition in Pisa!

10th International Workshop on
High Performance Data Mining
(in conjunction with ICDM)
Pisa, December 15th 2008


09:00 - 09:30
09:30 - 10:30
  Keynote Talk by Prof. S. Parthasarathy
10:30 - 11:00
  Coffee Break
11:00 - 13:00
  1st session: Grid, SOA and parallel DM
   Service Oriented KDD: A Framework for GRID Data Mining Workflows
   Distributed Linear Programming and Resource Managment for Data Mining in Distributed Environments
   Distributed Data Mining Models as Services on the Grid
   Parallel Hierarchical Clustering on Market Basket Data
13:00 - 14:30
  Lunch Break
14:30 - 16:00
  2nd session: Parallel and Distributed DM
   Chi-Square Test Based Decision Trees Induction in Distributed Environment
   Investigation of Various Matrix Factorization Methods for Large Recommender Systems
   Efficient Distance Computation Using SQL Queries and UDFs
16:00 - 16:30
  Coffee Break
16:30 - 18:00
  3rd session: High performance DM
   Stream-Close: Fast Mining of Closed Frequent Itemsets in High Speed Data Streams
   Mining Unstructures Text at Gygabytes per Second Speeds
   Exploiting Graphic Card Processor Technology to Accelerate Data Mining Queries in SAP NetWeaver BIA

Each speaker will have 25 minutes for the presentation, plus 5 minues for questions. A laptop will be provided, therefore feel free send us your presentation in advance.

Keynote Talk

Speaker: Prof. Srinivasan Parthasarathy, Ohio State University
Title: Mining and Managing Tree Structured Data Efficiently


Recently the use of structured and semi-structured data has been increasing at a tremendous pace. Applications ranging from bioinformatics to XML databases from the World Wide Web to Computational Linguistics are generating and processing large amounts of structured and semi-structured data. There is a clear and present need to develop efficient algorithms to index, process. manage, and analyze such data.

In this talk I will cover our recent efforts on mining and managing tree structured data -- a sub class of problems in this field with applications in Linguistic data analysis and XML data management. Specifically we will discuss the role of "succinct" encodings and data structures when developing memory and I/O efficient solutions for indexing XML data. Additionally, in the context of mining tree structured data we will present similar ideas and solutions for multicore systems. Here, in addition to algorithm designs that reduce the latency to memory we also consider mechanisms to reduce the off-chip memory traffic or bandwidth. We also explore an adaptive task and data parallel algorithm design that facilitates effective parallelization in the presence of data and workload skew. A detailed analysis and performance benchmarking study validates the utility of the proposed algorithms. We will conclude with a discussion of the general purpose utility of the techniques developed and outline directions of future work.

This is joint work with my graduate student -- Shirish Tatikonda.



Dr. Srinivasan Parthasarathy received his PhD in Computer Science from the University of Rochester, New York, USA. He is currently an Associate Professor in the Computer Science and Engineering Department at the Ohio State University (OSU). Currently he is spending a delightful sabbatical year visiting the Database group at the National University of Singapore. His research interests are broadly in the areas of Data Mining, Databases, Bioinformatics and High Performance Computing.

He is a recipient of an NSF CAREER award in 2003, a DOE Early Career Award in 2004, an Ameritech Faculty fellowship in 2001 and an IBM Faculty Award in 2007. His papers have received several best paper awards from leading conferences in the field, including ones at SIAM international conference on data mining (SDM), IEEE international conference on data mining (ICDM), the Very Large Databases Conference (VLDB) and most recently at ACM Knowledge Discovery and Data Mining (SIGKDD).

He is a member of the ACM and the IEEE and has served on the program committees of leading conferences in the fields of data mining, databases, and high performance computing. He currently serves on the editorial boards of several journals including the IEEE Intelligent Systems, Distributed and Parallel Databases Journal, and the Data Mining and Knowledge Discovery: An International Journal. He served as one of the program chairs of SIAM Data Mining in 2007 and is currently serving as one of the general chairs for the 2009 edition.



