Cluster-preserving sampling from fully-dynamic streaming graphs

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

Uittreksel

Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

TaalEngels
Pagina's279-300
Aantal pagina's22
TijdschriftInformation Sciences
Volume482
DOI's
StatusGepubliceerd - 1 mei 2019

Vingerafdruk

Streaming
Sampling
Graph in graph theory
Clustering
Surge
Graph
Data storage equipment
Experiments
Vertex of a graph
Experiment

Trefwoorden

    Citeer dit

    @article{96e948d0f5684a46b53d25a654627eb5,
    title = "Cluster-preserving sampling from fully-dynamic streaming graphs",
    abstract = "Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.",
    keywords = "Clustering structure, Fully-dynamic streaming graph, Graph sampling, Isolated nodes, Reservoir sampling",
    author = "Jianpeng Zhang and Kaijie Zhu and Yulong Pei and George Fletcher and Mykola Pechenizkiy",
    year = "2019",
    month = "5",
    day = "1",
    doi = "10.1016/j.ins.2019.01.011",
    language = "English",
    volume = "482",
    pages = "279--300",
    journal = "Information Sciences",
    issn = "0020-0255",
    publisher = "Elsevier",

    }

    Cluster-preserving sampling from fully-dynamic streaming graphs. / Zhang, Jianpeng (Corresponding author); Zhu, Kaijie; Pei, Yulong; Fletcher, George; Pechenizkiy, Mykola.

    In: Information Sciences, Vol. 482, 01.05.2019, blz. 279-300.

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    TY - JOUR

    T1 - Cluster-preserving sampling from fully-dynamic streaming graphs

    AU - Zhang,Jianpeng

    AU - Zhu,Kaijie

    AU - Pei,Yulong

    AU - Fletcher,George

    AU - Pechenizkiy,Mykola

    PY - 2019/5/1

    Y1 - 2019/5/1

    N2 - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

    AB - Current sampling techniques on graphs (i.e., network-structured data) mainly study static graphs and suppose that the whole graph is available at all times. However, a surge of graphs are becoming too large-scale and/or fully-dynamic (i.e., nodes and edges are added or deleted arbitrarily) to be handled with small memory footprint. Moreover, the intrinsic property (i.e., clustering structure) has been ignored and is not preserved well when the sampling performs. To solve these issues, we propose a Cluster-preserving Partially Induced Edge Sampling (CPIES) algorithm that dynamically keep up-to-date samples in an online fashion and preserve the inherent clustering structure in streaming graphs. The experiments on various synthetic and real-world graphs demonstrated that the proposed CPIES algorithm is capable of preserving the inherent clustering structure while sampling from the fully-dynamic streaming graphs, and performs better than other competing algorithms in cluster-related properties.

    KW - Clustering structure

    KW - Fully-dynamic streaming graph

    KW - Graph sampling

    KW - Isolated nodes

    KW - Reservoir sampling

    UR - http://www.scopus.com/inward/record.url?scp=85059937266&partnerID=8YFLogxK

    U2 - 10.1016/j.ins.2019.01.011

    DO - 10.1016/j.ins.2019.01.011

    M3 - Article

    VL - 482

    SP - 279

    EP - 300

    JO - Information Sciences

    T2 - Information Sciences

    JF - Information Sciences

    SN - 0020-0255

    ER -