Creating software is a process of refining a concept to an implementation. This process consists of several stages represented by documents, models and plans at several levels of abstraction. Mostly, the refinement process requires creativity of the programmers, but sometimes the task is boring and repetitive. This repetitive work is an indication that the program is not written at the most suitable level of abstraction. The level of abstraction offered by the used programming language might be too low to remove the recurring code. Code generators can be used to raise the level of abstraction of program specifications and to automate the repetitive work. This thesis focuses on code generators based on templates. Templates are one of the techniques to implement a code generator. Templates allow extension of the syntax of a programming language, enabling generative programming without modifying the underlying compiler. Four artifacts are involved in a template based generator: templates, input data, a template evaluator and output code. The templates we consider are a concrete (incomplete) representation of the output document, i.e. object code, that contains holes, i.e. the meta code. These holes are filled by the template evaluator using information from the input data to obtain the output code. Templates are widely used to generate HTML code in web applications. They can be used for generating all kinds of text, like e-mails or (source) code. In this thesis we limit the scope to the generation of source code. The central research question is how the quality of template based code generators can be improved. Quality, in general, is a broad notion and our scope is limited to the technical quality of templates and generated code. We focused on improving the maintainability of template based code generators and the correctness of the generated code. This is facilitated by the three main contributions provided by this thesis. First, the maintainability of template based code generators is increased by specifying the following requirement for our metalanguage. Our metalanguage should not be rich enough to allow programming in templates, without being too restrictive to express some code generators. We used the theory of formal languages to specify our metalanguage. Second, we ensure correctness of the templates and generated code. Third, the presented theory and techniques are validated by case studies. These case studies show application of templates in real world applications, increased maintainability and syntactical correctness of generated code. Our metalanguage should not be rich enough to allow programming in templates, without being too restrictive to express some code generators. The theory of formal languages is used to specify the requirements for our metalanguage. As we only consider to generate programming languages, it is sufficient to support the generation of languages defined by context-free grammars. This assumption is used to derive a metalanguage, that is rich enough to specify code generators that are able to instantiate all possible sentences of a context-free language. A specific case of a code generator, the unparser, is a program that can instantiate all sentences of a context-free language. We proved that an unparser can be implemented using a linear deterministic topdown tree-to-string transducer. We call this property unparser-completeness. Our metalanguage is based on a linear deterministic top-down tree-to-string transducer. Recall that the goal of specifying the requirements of the metalanguage is to increase the maintainability of template based code generators, without being too restrictive. To validate that our metalanguage is not too restrictive and leads to better maintainable templates, we compared it with four off-the-shelf text template systems by implementing an unparser. We have observed that the industrial template evaluators provide a Turing complete metalanguage, but they do not contain a block scoping mechanism for the meta-variables. This results in undesired additional boilerplate meta code in their templates. The second contribution is guaranteeing the correctness of the generated code. Correctness of the generated code can be divided in two concerns: syntactical correctness and semantical correctness. We start with syntactical correctness of the generated code. The use of text templates implies that syntactical correctness of the generated code can only be detected at compilation time. This means that errors detected during the compilation are reported on the level of the generated code. The developer is required to trace back manually the errors to their origin in the template or input data. We believe that programs manipulating source code should not consider the object code as text to detect errors as early as possible. We present an approach where the grammars of the object language and metalanguage can be combined in a modular way. Combining both grammars allows parsing both languages simultaneously. Syntax errors in both languages of the template will be found while parsing it. Moreover, only parsing a template is not sufficient to ensure that the generated code will be free of syntax errors. The template evaluator must be equipped with a mechanism to guarantee its output will be syntactically correct. We discuss our mechanism in short. A parse tree is constructed during the parsing of the template. This tree contains subtrees for the object code and subtrees for the meta code. While evaluating the template, subtrees of the meta code are substituted by object code subtrees. The template evaluator checks whether the root nonterminal of the object code subtree is equal to the root nonterminal of the meta code subtree. When both are equal, it is allowed to substitute the meta code. When the root nonterminals are distinct an accurate error message is generated. The template evaluator terminates when all meta code subtrees are substituted. The result is a parse tree of the object language and thus syntactically correct. We call this process syntax safe code generation. In order to validate that the presented techniques increase maintainability and ensure syntactical correctness, we implemented our ideas in a syntax safe template evaluator called Repleo. Repleo has been applied in four case studies. The first case is a real world situation, where it is required to generate a three tier web application from a data model. This case showed that multiple layers of an applications defined in different programming languages can be generated from a single model. The second case and third case are used to show that our metalanguage results in a better maintainable code generator. Our metalanguage forces to use a two layer code generator with separation of concerns between the two layers, where the original implementations are less modular. The last case study shows that ensuring syntactical correctness results in the prevention of cross-site scripting attacks in dynamic generation of web pages. Recall that one of our goals was ensuring the correctness of the generated code. We also showed that is possible to check static semantic properties of templates. Static semantic checks are defined for the metalanguage, for the object language and checks for the situations where the object language is dependent on the metalanguage. We implemented a prototype of a static semantic checker for PicoJava templates using attribute grammars. The use of attribute grammars leads to re-use of the original PicoJava checker. Summarizing, in this thesis we have formulated the requirements for a metalanguage and discussed how to implement a syntax safe template evaluator. This results in better maintainable template based code generators and more reliable generated code.
|Qualification||Doctor of Philosophy|
|Award date||1 Feb 2011|
|Place of Publication||Eindhoven|
|Publication status||Published - 2011|