### Abstract

The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

Original language | English |
---|---|

Title of host publication | Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic |

Pages | 176-190 |

Number of pages | 15 |

Publication status | Published - 2013 |

Event | Prague Stringology Conference 2013, PSC 2013 - Prague, Czech Republic Duration: 2 Sep 2013 → 4 Sep 2013 |

### Conference

Conference | Prague Stringology Conference 2013, PSC 2013 |
---|---|

Country | Czech Republic |

City | Prague |

Period | 2/09/13 → 4/09/13 |

### Fingerprint

### Keywords

- Approximate automaton
- Factor oracle
- Failure automaton
- Pattern matching
- Weak factor recognition

### Cite this

*Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic*(pp. 176-190)

}

*Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic.*pp. 176-190, Prague Stringology Conference 2013, PSC 2013, Prague, Czech Republic, 2/09/13.

**Weak factor automata : Comparing (failure) oracles and storacles.** / Cleophas, Loek; Kourie, Derrick G.; Watson, Bruce W.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

TY - GEN

T1 - Weak factor automata

T2 - Comparing (failure) oracles and storacles

AU - Cleophas, Loek

AU - Kourie, Derrick G.

AU - Watson, Bruce W.

PY - 2013

Y1 - 2013

N2 - The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

AB - The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.

KW - Approximate automaton

KW - Factor oracle

KW - Failure automaton

KW - Pattern matching

KW - Weak factor recognition

UR - http://www.scopus.com/inward/record.url?scp=84884598718&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84884598718

SN - 9788001053300

SP - 176

EP - 190

BT - Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic

ER -