TY - GEN

T1 - Automaton-based sublinear keyword pattern matching

AU - Cleophas, L.G.W.A.

AU - Watson, B.W.

AU - Zwaan, G.

PY - 2004

Y1 - 2004

N2 - We show how automaton-based sublinear keyword pattern matching (skpm) algorithms appearing in the literature can be seen as different instantiations of a general automaton-based skpm algorithm skeleton. Such algorithms use finite automata (FA) for efficient computation of string membership in a certain language. The algorithms were formally derived as part of a new skpm algorithm taxonomy, based on an earlier suffix-based skpm algorithm taxonomy [1]. Such a taxonomy is based on deriving the algorithms from a common starting point by successively adding algorithm and problem details and has a number of advantages. It provides correctness arguments, clarifies the working of the algorithms and their interrelationships, helps in implementing the algorithms, and may lead to new algorithms being discovered by finding gaps in the taxonomy. We show how to arrive at the general algorithm skeleton and derive some instantiations, leading to well-known factor- and factor oracle-based algorithms. In doing so, we show the shift functions used for them can be (strengthenings of) shift functions used for suffix-based algorithms. This also results in a number of previously undescribed factor-based skpm algorithm variants, whose performance remains to be investigated.

AB - We show how automaton-based sublinear keyword pattern matching (skpm) algorithms appearing in the literature can be seen as different instantiations of a general automaton-based skpm algorithm skeleton. Such algorithms use finite automata (FA) for efficient computation of string membership in a certain language. The algorithms were formally derived as part of a new skpm algorithm taxonomy, based on an earlier suffix-based skpm algorithm taxonomy [1]. Such a taxonomy is based on deriving the algorithms from a common starting point by successively adding algorithm and problem details and has a number of advantages. It provides correctness arguments, clarifies the working of the algorithms and their interrelationships, helps in implementing the algorithms, and may lead to new algorithms being discovered by finding gaps in the taxonomy. We show how to arrive at the general algorithm skeleton and derive some instantiations, leading to well-known factor- and factor oracle-based algorithms. In doing so, we show the shift functions used for them can be (strengthenings of) shift functions used for suffix-based algorithms. This also results in a number of previously undescribed factor-based skpm algorithm variants, whose performance remains to be investigated.

U2 - 10.1007/978-3-540-30213-1_3

DO - 10.1007/978-3-540-30213-1_3

M3 - Conference contribution

SN - 3-540-23210-9

T3 - Lecture Notes in Computer Science

SP - 18

EP - 29

BT - String Processing and Information Retrieval (Proceedings 11th International Conference, SPIRE 2004, Padova, Italy, October 5-8, 2004)

A2 - Apostolico, A.

A2 - Melucci, M.

PB - Springer

CY - Berlin

ER -