Weak factor automata: Comparing (failure) oracles and storacles

Loek Cleophas, Derrick G. Kourie, Bruce W. Watson

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)

Abstract

The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

Original languageEnglish
Title of host publicationProceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic
Pages176-190
Number of pages15
Publication statusPublished - 2013
EventPrague Stringology Conference 2013, PSC 2013 - Prague, Czech Republic
Duration: 2 Sep 20134 Sep 2013

Conference

ConferencePrague Stringology Conference 2013, PSC 2013
CountryCzech Republic
CityPrague
Period2/09/134/09/13

Fingerprint

Automata
Deterministic Finite Automata
Hardware Implementation
Intrusion Detection
Simplicity
Data Structures
Arc of a curve
Strings
Upper bound

Keywords

  • Approximate automaton
  • Factor oracle
  • Failure automaton
  • Pattern matching
  • Weak factor recognition

Cite this

Cleophas, L., Kourie, D. G., & Watson, B. W. (2013). Weak factor automata: Comparing (failure) oracles and storacles. In Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic (pp. 176-190)
Cleophas, Loek ; Kourie, Derrick G. ; Watson, Bruce W. / Weak factor automata : Comparing (failure) oracles and storacles. Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic. 2013. pp. 176-190
@inproceedings{eb2d00a4280f419691fd1464d34ef2d5,
title = "Weak factor automata: Comparing (failure) oracles and storacles",
abstract = "The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7{\%} in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9{\%} for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.",
keywords = "Approximate automaton, Factor oracle, Failure automaton, Pattern matching, Weak factor recognition",
author = "Loek Cleophas and Kourie, {Derrick G.} and Watson, {Bruce W.}",
year = "2013",
language = "English",
isbn = "9788001053300",
pages = "176--190",
booktitle = "Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic",

}

Cleophas, L, Kourie, DG & Watson, BW 2013, Weak factor automata: Comparing (failure) oracles and storacles. in Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic. pp. 176-190, Prague Stringology Conference 2013, PSC 2013, Prague, Czech Republic, 2/09/13.

Weak factor automata : Comparing (failure) oracles and storacles. / Cleophas, Loek; Kourie, Derrick G.; Watson, Bruce W.

Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic. 2013. p. 176-190.

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - Weak factor automata

T2 - Comparing (failure) oracles and storacles

AU - Cleophas, Loek

AU - Kourie, Derrick G.

AU - Watson, Bruce W.

PY - 2013

Y1 - 2013

N2 - The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

AB - The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

KW - Approximate automaton

KW - Factor oracle

KW - Failure automaton

KW - Pattern matching

KW - Weak factor recognition

UR - http://www.scopus.com/inward/record.url?scp=84884598718&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84884598718

SN - 9788001053300

SP - 176

EP - 190

BT - Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic

ER -

Cleophas L, Kourie DG, Watson BW. Weak factor automata: Comparing (failure) oracles and storacles. In Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic. 2013. p. 176-190