Apriori versions based on MapReduce for mining frequent patterns on big data

J.M. Luna, F. Padillo, M. Pechenizkiy, S. Ventura

Research output: Contribution to journalArticleAcademicpeer-review

5 Citations (Scopus)

Abstract

Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

Original languageEnglish
Pages (from-to)2851-2865
JournalIEEE Transactions on Cybernetics
Volume48
Issue number10
DOIs
Publication statusPublished - 2018

Fingerprint

Big data

Keywords

  • Algorithm design and analysis
  • Big Data
  • Big data
  • Computer science
  • Data mining
  • Hadoop
  • MapReduce
  • Open source software
  • pattern mining
  • Proposals

Cite this

Luna, J.M. ; Padillo, F. ; Pechenizkiy, M. ; Ventura, S. / Apriori versions based on MapReduce for mining frequent patterns on big data. In: IEEE Transactions on Cybernetics. 2018 ; Vol. 48, No. 10. pp. 2851-2865.
@article{9a7723e9d4654f42a2089dc09a3387d7,
title = "Apriori versions based on MapReduce for mining frequent patterns on big data",
abstract = "Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.",
keywords = "Algorithm design and analysis, Big Data, Big data, Computer science, Data mining, Hadoop, MapReduce, Open source software, pattern mining, Proposals",
author = "J.M. Luna and F. Padillo and M. Pechenizkiy and S. Ventura",
year = "2018",
doi = "10.1109/TCYB.2017.2751081",
language = "English",
volume = "48",
pages = "2851--2865",
journal = "IEEE Transactions on Cybernetics",
issn = "2168-2267",
publisher = "Institute of Electrical and Electronics Engineers",
number = "10",

}

Apriori versions based on MapReduce for mining frequent patterns on big data. / Luna, J.M.; Padillo, F.; Pechenizkiy, M.; Ventura, S.

In: IEEE Transactions on Cybernetics, Vol. 48, No. 10, 2018, p. 2851-2865.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Apriori versions based on MapReduce for mining frequent patterns on big data

AU - Luna, J.M.

AU - Padillo, F.

AU - Pechenizkiy, M.

AU - Ventura, S.

PY - 2018

Y1 - 2018

N2 - Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

AB - Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

KW - Algorithm design and analysis

KW - Big Data

KW - Big data

KW - Computer science

KW - Data mining

KW - Hadoop

KW - MapReduce

KW - Open source software

KW - pattern mining

KW - Proposals

UR - http://www.scopus.com/inward/record.url?scp=85030756215&partnerID=8YFLogxK

U2 - 10.1109/TCYB.2017.2751081

DO - 10.1109/TCYB.2017.2751081

M3 - Article

C2 - 28961134

AN - SCOPUS:85030756215

VL - 48

SP - 2851

EP - 2865

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

SN - 2168-2267

IS - 10

ER -