Tree algorithms : two taxonomies and a toolkit

Research output: ThesisPhd Thesis 1 (Research TU/e / Graduation TU/e)

513 Downloads (Pure)

Abstract

The subject of this dissertation is the construction of two taxonomies and a toolkit of algorithms solving two different but closely related problems from the domain of regular tree languages. In our context, a taxonomy is a classification of algorithms according to their essential details. Its starting point is a high-level algorithm whose correctness is easily shown. By adding details, one obtains refinements or variations of that algorithm and of algorithms obtained in such a way. This may lead to algorithms from the literature or to new ones. Every detail has correctness arguments associated with it, so that each algorithm’s correctness follows from the correctness of the starting point and details added. The sequence of details from the starting point to an algorithm shows how the algorithm can be obtained from the starting point and can be used to characterize the particular algorithm. An algorithm taxonomy may be depicted as a directed acyclic graph, in which nodes refer to algorithms and branches refer to details. We construct taxonomies for two algorithmic problems from the domain of regular tree languages on ordered, ranked trees. This domain has a rich theory, and parts of this theory have broad applicability in areas such as code generation in compilers—particularly for instruction selection or optimization—and term rewriting. Underlying the practical applications are a number of algorithmic problems, of which tree acceptance and tree pattern matching are two. Many algorithms solving these problems have been described and used in practice. Unfortunately, the domain suffers from a number of deficiencies: 1. Inaccessibility of the theory and algorithms, as they are scattered over the literature and few or no overview publications—particularly algorithm oriented ones—exist. 2. Difficulty of comparing the algorithms due to differences in presentation style and level of formality. 3. Lack of reference to the theory and of correctness arguments in publications of practical algorithms. 4. Lack of a large and coherent collection of implementations of the algorithms. 5. Difficulty of choosing between different algorithms for practical applications. The construction of taxonomies aims to solve the first three deficiencies. The construction process is part of TABASCO, a method for domain modeling and domain engineering for a particular kind of domain in computer science. As a first step, an overview of relevant parts of the theory is created. A literature survey is then performed to gather algorithms solving the algorithmic problem or problems. Based on the results, the algorithms are rephrased in a common presentation style and a model in the form of an algorithm taxonomy for the algorithmic problem or problems is constructed. The taxonomy makes the commonalities and variations between the algorithms as well as the correctness arguments for the algorithms explicit. The problems of tree acceptance and tree pattern matching are related and the algorithms solving them involve many of the same algorithmic ingredients. The commonalities lead to similarities between the taxonomies constructed for each of the two problems. Furthermore, references to and discussions of the algorithms’ occurrences in the literature are included. The last two of the five deficiencies mentioned above can be solved using TABASCO’s domain engineering steps, which include the creation of a toolkit of algorithm implementations and of a domain specific language to allow more effective use of such a toolkit. The design of a coherent toolkit is simplified by the availability of a taxonomy, which indicates the commonalities and differences between the various algorithms. The high-level design choices are guided by the taxonomy structure, and the choice of which language constructs to use to implement various design parts can be made based on standard design techniques. We consider the domain of keyword pattern matching as a brief case study for all of the method’s domain engineering steps. For the domain of regular tree languages, we consider the design of a toolkit of tree acceptance and tree pattern matching algorithms based on the two taxonomies. The taxonomy-based toolkit design results in a coherent and easily extendable toolkit.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Department of Mathematics and Computer Science
Supervisors/Advisors
  • van den Brand, Mark G.J., Promotor
  • Watson, Bruce William, Promotor
  • Hemerik, C. (Kees), Copromotor
Award date1 Apr 2008
Place of PublicationEindhoven
Publisher
Print ISBNs978-90-386-1228-7
DOIs
Publication statusPublished - 2008

Fingerprint

Taxonomies
Pattern matching

Cite this

Cleophas, L.G.W.A.. / Tree algorithms : two taxonomies and a toolkit. Eindhoven : Technische Universiteit Eindhoven, 2008. 294 p.
@phdthesis{da4ca1e5b12748c7829416a4c21fc834,
title = "Tree algorithms : two taxonomies and a toolkit",
abstract = "The subject of this dissertation is the construction of two taxonomies and a toolkit of algorithms solving two different but closely related problems from the domain of regular tree languages. In our context, a taxonomy is a classification of algorithms according to their essential details. Its starting point is a high-level algorithm whose correctness is easily shown. By adding details, one obtains refinements or variations of that algorithm and of algorithms obtained in such a way. This may lead to algorithms from the literature or to new ones. Every detail has correctness arguments associated with it, so that each algorithm’s correctness follows from the correctness of the starting point and details added. The sequence of details from the starting point to an algorithm shows how the algorithm can be obtained from the starting point and can be used to characterize the particular algorithm. An algorithm taxonomy may be depicted as a directed acyclic graph, in which nodes refer to algorithms and branches refer to details. We construct taxonomies for two algorithmic problems from the domain of regular tree languages on ordered, ranked trees. This domain has a rich theory, and parts of this theory have broad applicability in areas such as code generation in compilers—particularly for instruction selection or optimization—and term rewriting. Underlying the practical applications are a number of algorithmic problems, of which tree acceptance and tree pattern matching are two. Many algorithms solving these problems have been described and used in practice. Unfortunately, the domain suffers from a number of deficiencies: 1. Inaccessibility of the theory and algorithms, as they are scattered over the literature and few or no overview publications—particularly algorithm oriented ones—exist. 2. Difficulty of comparing the algorithms due to differences in presentation style and level of formality. 3. Lack of reference to the theory and of correctness arguments in publications of practical algorithms. 4. Lack of a large and coherent collection of implementations of the algorithms. 5. Difficulty of choosing between different algorithms for practical applications. The construction of taxonomies aims to solve the first three deficiencies. The construction process is part of TABASCO, a method for domain modeling and domain engineering for a particular kind of domain in computer science. As a first step, an overview of relevant parts of the theory is created. A literature survey is then performed to gather algorithms solving the algorithmic problem or problems. Based on the results, the algorithms are rephrased in a common presentation style and a model in the form of an algorithm taxonomy for the algorithmic problem or problems is constructed. The taxonomy makes the commonalities and variations between the algorithms as well as the correctness arguments for the algorithms explicit. The problems of tree acceptance and tree pattern matching are related and the algorithms solving them involve many of the same algorithmic ingredients. The commonalities lead to similarities between the taxonomies constructed for each of the two problems. Furthermore, references to and discussions of the algorithms’ occurrences in the literature are included. The last two of the five deficiencies mentioned above can be solved using TABASCO’s domain engineering steps, which include the creation of a toolkit of algorithm implementations and of a domain specific language to allow more effective use of such a toolkit. The design of a coherent toolkit is simplified by the availability of a taxonomy, which indicates the commonalities and differences between the various algorithms. The high-level design choices are guided by the taxonomy structure, and the choice of which language constructs to use to implement various design parts can be made based on standard design techniques. We consider the domain of keyword pattern matching as a brief case study for all of the method’s domain engineering steps. For the domain of regular tree languages, we consider the design of a toolkit of tree acceptance and tree pattern matching algorithms based on the two taxonomies. The taxonomy-based toolkit design results in a coherent and easily extendable toolkit.",
author = "L.G.W.A. Cleophas",
year = "2008",
doi = "10.6100/IR633481",
language = "English",
isbn = "978-90-386-1228-7",
publisher = "Technische Universiteit Eindhoven",
school = "Department of Mathematics and Computer Science",

}

Cleophas, LGWA 2008, 'Tree algorithms : two taxonomies and a toolkit', Doctor of Philosophy, Department of Mathematics and Computer Science, Eindhoven. https://doi.org/10.6100/IR633481

Tree algorithms : two taxonomies and a toolkit. / Cleophas, L.G.W.A.

Eindhoven : Technische Universiteit Eindhoven, 2008. 294 p.

Research output: ThesisPhd Thesis 1 (Research TU/e / Graduation TU/e)

TY - THES

T1 - Tree algorithms : two taxonomies and a toolkit

AU - Cleophas, L.G.W.A.

PY - 2008

Y1 - 2008

N2 - The subject of this dissertation is the construction of two taxonomies and a toolkit of algorithms solving two different but closely related problems from the domain of regular tree languages. In our context, a taxonomy is a classification of algorithms according to their essential details. Its starting point is a high-level algorithm whose correctness is easily shown. By adding details, one obtains refinements or variations of that algorithm and of algorithms obtained in such a way. This may lead to algorithms from the literature or to new ones. Every detail has correctness arguments associated with it, so that each algorithm’s correctness follows from the correctness of the starting point and details added. The sequence of details from the starting point to an algorithm shows how the algorithm can be obtained from the starting point and can be used to characterize the particular algorithm. An algorithm taxonomy may be depicted as a directed acyclic graph, in which nodes refer to algorithms and branches refer to details. We construct taxonomies for two algorithmic problems from the domain of regular tree languages on ordered, ranked trees. This domain has a rich theory, and parts of this theory have broad applicability in areas such as code generation in compilers—particularly for instruction selection or optimization—and term rewriting. Underlying the practical applications are a number of algorithmic problems, of which tree acceptance and tree pattern matching are two. Many algorithms solving these problems have been described and used in practice. Unfortunately, the domain suffers from a number of deficiencies: 1. Inaccessibility of the theory and algorithms, as they are scattered over the literature and few or no overview publications—particularly algorithm oriented ones—exist. 2. Difficulty of comparing the algorithms due to differences in presentation style and level of formality. 3. Lack of reference to the theory and of correctness arguments in publications of practical algorithms. 4. Lack of a large and coherent collection of implementations of the algorithms. 5. Difficulty of choosing between different algorithms for practical applications. The construction of taxonomies aims to solve the first three deficiencies. The construction process is part of TABASCO, a method for domain modeling and domain engineering for a particular kind of domain in computer science. As a first step, an overview of relevant parts of the theory is created. A literature survey is then performed to gather algorithms solving the algorithmic problem or problems. Based on the results, the algorithms are rephrased in a common presentation style and a model in the form of an algorithm taxonomy for the algorithmic problem or problems is constructed. The taxonomy makes the commonalities and variations between the algorithms as well as the correctness arguments for the algorithms explicit. The problems of tree acceptance and tree pattern matching are related and the algorithms solving them involve many of the same algorithmic ingredients. The commonalities lead to similarities between the taxonomies constructed for each of the two problems. Furthermore, references to and discussions of the algorithms’ occurrences in the literature are included. The last two of the five deficiencies mentioned above can be solved using TABASCO’s domain engineering steps, which include the creation of a toolkit of algorithm implementations and of a domain specific language to allow more effective use of such a toolkit. The design of a coherent toolkit is simplified by the availability of a taxonomy, which indicates the commonalities and differences between the various algorithms. The high-level design choices are guided by the taxonomy structure, and the choice of which language constructs to use to implement various design parts can be made based on standard design techniques. We consider the domain of keyword pattern matching as a brief case study for all of the method’s domain engineering steps. For the domain of regular tree languages, we consider the design of a toolkit of tree acceptance and tree pattern matching algorithms based on the two taxonomies. The taxonomy-based toolkit design results in a coherent and easily extendable toolkit.

AB - The subject of this dissertation is the construction of two taxonomies and a toolkit of algorithms solving two different but closely related problems from the domain of regular tree languages. In our context, a taxonomy is a classification of algorithms according to their essential details. Its starting point is a high-level algorithm whose correctness is easily shown. By adding details, one obtains refinements or variations of that algorithm and of algorithms obtained in such a way. This may lead to algorithms from the literature or to new ones. Every detail has correctness arguments associated with it, so that each algorithm’s correctness follows from the correctness of the starting point and details added. The sequence of details from the starting point to an algorithm shows how the algorithm can be obtained from the starting point and can be used to characterize the particular algorithm. An algorithm taxonomy may be depicted as a directed acyclic graph, in which nodes refer to algorithms and branches refer to details. We construct taxonomies for two algorithmic problems from the domain of regular tree languages on ordered, ranked trees. This domain has a rich theory, and parts of this theory have broad applicability in areas such as code generation in compilers—particularly for instruction selection or optimization—and term rewriting. Underlying the practical applications are a number of algorithmic problems, of which tree acceptance and tree pattern matching are two. Many algorithms solving these problems have been described and used in practice. Unfortunately, the domain suffers from a number of deficiencies: 1. Inaccessibility of the theory and algorithms, as they are scattered over the literature and few or no overview publications—particularly algorithm oriented ones—exist. 2. Difficulty of comparing the algorithms due to differences in presentation style and level of formality. 3. Lack of reference to the theory and of correctness arguments in publications of practical algorithms. 4. Lack of a large and coherent collection of implementations of the algorithms. 5. Difficulty of choosing between different algorithms for practical applications. The construction of taxonomies aims to solve the first three deficiencies. The construction process is part of TABASCO, a method for domain modeling and domain engineering for a particular kind of domain in computer science. As a first step, an overview of relevant parts of the theory is created. A literature survey is then performed to gather algorithms solving the algorithmic problem or problems. Based on the results, the algorithms are rephrased in a common presentation style and a model in the form of an algorithm taxonomy for the algorithmic problem or problems is constructed. The taxonomy makes the commonalities and variations between the algorithms as well as the correctness arguments for the algorithms explicit. The problems of tree acceptance and tree pattern matching are related and the algorithms solving them involve many of the same algorithmic ingredients. The commonalities lead to similarities between the taxonomies constructed for each of the two problems. Furthermore, references to and discussions of the algorithms’ occurrences in the literature are included. The last two of the five deficiencies mentioned above can be solved using TABASCO’s domain engineering steps, which include the creation of a toolkit of algorithm implementations and of a domain specific language to allow more effective use of such a toolkit. The design of a coherent toolkit is simplified by the availability of a taxonomy, which indicates the commonalities and differences between the various algorithms. The high-level design choices are guided by the taxonomy structure, and the choice of which language constructs to use to implement various design parts can be made based on standard design techniques. We consider the domain of keyword pattern matching as a brief case study for all of the method’s domain engineering steps. For the domain of regular tree languages, we consider the design of a toolkit of tree acceptance and tree pattern matching algorithms based on the two taxonomies. The taxonomy-based toolkit design results in a coherent and easily extendable toolkit.

U2 - 10.6100/IR633481

DO - 10.6100/IR633481

M3 - Phd Thesis 1 (Research TU/e / Graduation TU/e)

SN - 978-90-386-1228-7

PB - Technische Universiteit Eindhoven

CY - Eindhoven

ER -

Cleophas LGWA. Tree algorithms : two taxonomies and a toolkit. Eindhoven: Technische Universiteit Eindhoven, 2008. 294 p. https://doi.org/10.6100/IR633481