Abstract
The factor oracle [3] is a data structure for weak factor recognition. It is a deterministic finite automaton (DFA) built on a string p of length m that is acyclic, recognizes at least all factors of p, has m+ 1 states which are all final, is homogeneous, and has m to 2m-1 transitions. The factor storacle [6] is an alternative automaton that satisfies the same properties, except that its number of transitions may be larger than 2m-1, although it is conjectured to be linear with an upper bound of at most 3m. In [14] (among others), we described the concept of a failure automaton i.e. a failure DFA (FDFA), in which so-called failure transitions are used to reduce the total number of transitions and thus reduce representation space compared to the use of a DFA. We modify factor oracle and storacle construction algorithms to introduce failure arcs during the respective automata's construction. We thus end up with four deterministic automata types for weak factor recognition: factor oracle, factor storacle, failure factor oracle, and failure factor storacle. We compare them empirically in terms of size. The results show that despite the relative simplicity of (failure) factor (st)oracles, the failure versions show additional savings of 2-7% in number of transitions, for generated keywords of length 5-9, and of e.g. 5-9% for English words of lengths around 9-15. This may already be substantial in memory-restricted settings such as hardware implementations of automata. The results indicate the gains increase for longer keywords, which seems promising for applications in DNA processing and intrusion detection. Furthermore, our results provide a rather negative result on storacles: apart from rare cases, factor storacles do not have fewer transitions than factor oracles, and similarly for failure factor storacles versus failure factor oracles.
Original language | English |
---|---|
Title of host publication | Proceedings of the Prague Stringology Conference 2013, PSC 2013, 2-4 September 2013, Prague, Czech Republic |
Pages | 176-190 |
Number of pages | 15 |
Publication status | Published - 2013 |
Event | Prague Stringology Conference 2013, PSC 2013 - Prague, Czech Republic Duration: 2 Sept 2013 → 4 Sept 2013 |
Conference
Conference | Prague Stringology Conference 2013, PSC 2013 |
---|---|
Country/Territory | Czech Republic |
City | Prague |
Period | 2/09/13 → 4/09/13 |
Keywords
- Approximate automaton
- Factor oracle
- Failure automaton
- Pattern matching
- Weak factor recognition