Efficient streaming detection of hidden clusters in big data using subspace stream clustering

M. Hassani, T. Seidl

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Recently, many data mining techniques were revisited to cope with the new big data challenges. Nearly all of these algorithms considered the efficiency of the mining algorithm to survive the increasing size of the data. However, as the dimensionality of the data increases, not only the efficiency but also the effectiveness of traditional mining algorithms is compromised. For instance, clusters hidden in some subspaces are hard to be detected using traditional clustering algorithms, as the dimensionality of the data increases. In this paper, we consider both the huge size, and the high dimensionality of big data by providing a novel solution that presents a three-phase model for subspace stream clustering algorithms. Our novel model, overcomes the huge size of the big data in its first phase, by continuously applying a streaming concept over the huge data objects, and summarizing them into micro-clusters. Then, after each certain batch of data, or after upon a user request, the second phase is applied over the data summarized in micro-clusters, to reconstruct the current distribution of the data out of the current summaries. In the third phase, a subspace clustering algorithm is applied to overcome the high dimensionality of the data, and to find hidden clusters within some subspace. An extensive evaluation study over different scenarios that follow our model over a big data set is performed.
Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 19th International Conference, DASFAA 2014, International Workshops: BDMA, DaMEN, SIM - 3 - , UnCrowd; Bali, Indonesia, April 21-24, 2014, Revised Selected Papers
EditorsW.-S. Han, M.L. Lee, A. Muliantara, N.A. Sanjaya, B. Talheim, S. Zhou
Place of PublicationHeidelberg
PublisherSpringer
Pages146-160
Number of pages15
ISBN (Electronic)978-3-662-43984-5
ISBN (Print)978-3-662-43983-8
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2nd International DASFAA Workshop on Big Data Management and Analytics (BDMA) - Bali, Indonesia
Duration: 21 Apr 201424 Apr 2014

Conference

Conference2nd International DASFAA Workshop on Big Data Management and Analytics (BDMA)
Country/TerritoryIndonesia
CityBali
Period21/04/1424/04/14

Fingerprint

Dive into the research topics of 'Efficient streaming detection of hidden clusters in big data using subspace stream clustering'. Together they form a unique fingerprint.

Cite this