Cluster-preserving sampling from fully-dynamic streaming graphs

Jianpeng Zhang (Corresponding author), Kaijie Zhu, Yulong Pei, George Fletcher, Mykola Pechenizkiy

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

LanguageEnglish
Pages279-300
Number of pages22
JournalInformation Sciences
Volume482
DOIs
StatePublished - 1 May 2019

Fingerprint

Streaming
Sampling
Graph in graph theory
Clustering
Surge
Graph
Data storage equipment
Experiments
Vertex of a graph
Experiment

Keywords

  • Clustering structure
  • Fully-dynamic streaming graph
  • Graph sampling
  • Isolated nodes
  • Reservoir sampling

Cite this

@article{96e948d0f5684a46b53d25a654627eb5,
title = "Cluster-preserving sampling from fully-dynamic streaming graphs",
abstract = "Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.",
keywords = "Clustering structure, Fully-dynamic streaming graph, Graph sampling, Isolated nodes, Reservoir sampling",
author = "Jianpeng Zhang and Kaijie Zhu and Yulong Pei and George Fletcher and Mykola Pechenizkiy",
year = "2019",
month = "5",
day = "1",
doi = "10.1016/j.ins.2019.01.011",
language = "English",
volume = "482",
pages = "279--300",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier",

}

Cluster-preserving sampling from fully-dynamic streaming graphs. / Zhang, Jianpeng (Corresponding author); Zhu, Kaijie; Pei, Yulong; Fletcher, George; Pechenizkiy, Mykola.

In: Information Sciences, Vol. 482, 01.05.2019, p. 279-300.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Cluster-preserving sampling from fully-dynamic streaming graphs

AU - Zhang,Jianpeng

AU - Zhu,Kaijie

AU - Pei,Yulong

AU - Fletcher,George

AU - Pechenizkiy,Mykola

PY - 2019/5/1

Y1 - 2019/5/1

N2 - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

AB - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

KW - Clustering structure

KW - Fully-dynamic streaming graph

KW - Graph sampling

KW - Isolated nodes

KW - Reservoir sampling

UR - http://www.scopus.com/inward/record.url?scp=85059937266&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2019.01.011

DO - 10.1016/j.ins.2019.01.011

M3 - Article

VL - 482

SP - 279

EP - 300

JO - Information Sciences

T2 - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -