TY - CHAP
T1 - Overview of efficient clustering methods for high-dimensional big data streams
AU - Hassani, M.
PY - 2019/1/1
Y1 - 2019/1/1
N2 - The majority of clustering approaches focused on static data. However, a big variety of recent applications and research issues in big data mining require dealing with continuous, possibly infinite streams of data, arriving at high velocity. Web traffic data, surveillance data, sensor measurements, and stock trading are only some examples of these daily-increasing applications. Additionally, as the growth of data volumes is accompanied by a similar expansion in their dimensionalities, clusters cannot be expected to completely appear when considering all attributes together. Subspace clustering is a general approach that solved that issue by automatically finding the hidden clusters within different subsets of the attributes rather than considering all attributes together. In this chapter, novel methods for an efficient subspace clustering of high-dimensional big data streams are presented. Approaches that efficiently combine the anytime clustering concept with the stream subspace clustering paradigm are discussed. Additionally, efficient and adaptive density-based clustering algorithms are presented for high-dimensional data streams. Novel open-source assessment framework and evaluation measures are additionally presented for subspace stream clustering.
AB - The majority of clustering approaches focused on static data. However, a big variety of recent applications and research issues in big data mining require dealing with continuous, possibly infinite streams of data, arriving at high velocity. Web traffic data, surveillance data, sensor measurements, and stock trading are only some examples of these daily-increasing applications. Additionally, as the growth of data volumes is accompanied by a similar expansion in their dimensionalities, clusters cannot be expected to completely appear when considering all attributes together. Subspace clustering is a general approach that solved that issue by automatically finding the hidden clusters within different subsets of the attributes rather than considering all attributes together. In this chapter, novel methods for an efficient subspace clustering of high-dimensional big data streams are presented. Approaches that efficiently combine the anytime clustering concept with the stream subspace clustering paradigm are discussed. Additionally, efficient and adaptive density-based clustering algorithms are presented for high-dimensional data streams. Novel open-source assessment framework and evaluation measures are additionally presented for subspace stream clustering.
U2 - 10.1007/978-3-319-97864-2_2
DO - 10.1007/978-3-319-97864-2_2
M3 - Chapter
SN - 978-3-319-97863-5
T3 - Unsupervised and Semi-Supervised Learning
SP - 25
EP - 42
BT - Clustering Methods for Big Data Analytics
A2 - Nasraoui, Olfa
A2 - Ben N'Cir, Chiheb-Eddine
PB - Springer
CY - Cham
ER -