Abstract
Recently, many data mining techniques were revisited to cope with the new big data challenges. Nearly all of these algorithms considered the efficiency of the mining algorithm to survive the increasing size of the data. However, as the dimensionality of the data increases, not only the efficiency but also the effectiveness of traditional mining algorithms is compromised. For instance, clusters hidden in some subspaces are hard to be detected using traditional clustering algorithms, as the dimensionality of the data increases. In this paper, we consider both the huge size, and the high dimensionality of big data by providing a novel solution that presents a three-phase model for subspace stream clustering algorithms. Our novel model, overcomes the huge size of the big data in its first phase, by continuously applying a streaming concept over the huge data objects, and summarizing them into micro-clusters. Then, after each certain batch of data, or after upon a user request, the second phase is applied over the data summarized in micro-clusters, to reconstruct the current distribution of the data out of the current summaries. In the third phase, a subspace clustering algorithm is applied to overcome the high dimensionality of the data, and to find hidden clusters within some subspace. An extensive evaluation study over different scenarios that follow our model over a big data set is performed.
Original language | English |
---|---|
Title of host publication | Database Systems for Advanced Applications - 19th International Conference, DASFAA 2014, International Workshops: BDMA, DaMEN, SIM - 3 - , UnCrowd; Bali, Indonesia, April 21-24, 2014, Revised Selected Papers |
Editors | W.-S. Han, M.L. Lee, A. Muliantara, N.A. Sanjaya, B. Talheim, S. Zhou |
Place of Publication | Heidelberg |
Publisher | Springer |
Pages | 146-160 |
Number of pages | 15 |
ISBN (Electronic) | 978-3-662-43984-5 |
ISBN (Print) | 978-3-662-43983-8 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 2nd International DASFAA Workshop on Big Data Management and Analytics (BDMA) - Bali, Indonesia Duration: 21 Apr 2014 → 24 Apr 2014 |
Conference
Conference | 2nd International DASFAA Workshop on Big Data Management and Analytics (BDMA) |
---|---|
Country/Territory | Indonesia |
City | Bali |
Period | 21/04/14 → 24/04/14 |