TY - JOUR
T1 - Cluster-preserving sampling from fully-dynamic streaming graphs
AU - Zhang, Jianpeng
AU - Zhu, Kaijie
AU - Pei, Yulong
AU - Fletcher, George
AU - Pechenizkiy, Mykola
PY - 2019/5/1
Y1 - 2019/5/1
N2 - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.
AB - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.
KW - Clustering structure
KW - Fully-dynamic streaming graph
KW - Graph sampling
KW - Isolated nodes
KW - Reservoir sampling
UR - http://www.scopus.com/inward/record.url?scp=85059937266&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2019.01.011
DO - 10.1016/j.ins.2019.01.011
M3 - Article
AN - SCOPUS:85059937266
SN - 0020-0255
VL - 482
SP - 279
EP - 300
JO - Information Sciences
JF - Information Sciences
ER -