### Abstract

Original language | Undefined |
---|---|

Pages (from-to) | 1-14 |

Number of pages | 14 |

Journal | South African Computer Journal |

Volume | 53 |

DOIs | |

Publication status | Published - 2014 |

Externally published | Yes |

### Cite this

*South African Computer Journal*,

*53*, 1-14. https://doi.org/10.18489/sacj.v53i0.199

}

*South African Computer Journal*, vol. 53, pp. 1-14. https://doi.org/10.18489/sacj.v53i0.199

**Weak factor automata: the failure of failure factor oracles?** / Cleophas, L.G.; Kourie, D.G.; Watson, B.W.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Weak factor automata: the failure of failure factor oracles?

AU - Cleophas, L.G.

AU - Kourie, D.G.

AU - Watson, B.W.

PY - 2014

Y1 - 2014

N2 - In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.

AB - In indexing of, and pattern matching on, DNA and text sequences, it is often important to represent all factors of a sequence. One efficient, compact representation is the factor oracle (FO). At the same time, any classical deterministic finite automaton (DFA) can be transformed to a so-called failure one (FDFA), which may use failure transitions to replace multiple symbol transitions, potentially yielding a more compact representation. We combine the two ideas and directly construct a failure factor oracle (FFO) from a given sequence, in contrast to ex post facto transformation to an FDFA. The algorithm is suitable for both short and long sequences. We empirically compared the resulting FFOs and FOs on number of transitions for many DNA sequences of lengths 4 − 512, showing gains of up to 10% in total number of transitions, with failure transitions also taking up less space than symbol transitions. The resulting FFOs can be used for indexing, as well as in a variant of the FO-using backward oracle matching algorithm. We discuss and classify this pattern matching algorithm in terms of the keyword pattern matching taxonomies of Watson, Cleophas and Zwaan. We also empirically compared the use of FOs and FFOs in such backward reading pattern matching algorithms, using both DNA and natural language (English) data sets. The results indicate that the decrease in pattern matching performance of an algorithm using an FFO instead of an FO may outweigh the gain in representation space by using an FFO instead of an FO.

U2 - 10.18489/sacj.v53i0.199

DO - 10.18489/sacj.v53i0.199

M3 - Tijdschriftartikel

VL - 53

SP - 1

EP - 14

JO - South African Computer Journal

JF - South African Computer Journal

SN - 1015-7999

ER -