An evaluation of structured language modeling for automatic speech recognition

Johanna Bjorklund, Loek Cleophas, M. Karlsson

Research output: Contribution to journalArticleAcademicpeer-review

20 Downloads (Pure)

Abstract

We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.
Original languageEnglish
Pages (from-to)1019-1034
JournalJournal of Universal Computer Science
Volume23
Issue number11
DOIs
Publication statusPublished - 2017
Externally publishedYes

Keywords

  • language modeling
  • automatic speech recognition
  • probabilistic lexicalized tree-insertion grammars

Cite this

@article{01269d604e31456eb4410c882bdfd3ac,
title = "An evaluation of structured language modeling for automatic speech recognition",
abstract = "We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.",
keywords = "language modeling, automatic speech recognition, probabilistic lexicalized tree-insertion grammars",
author = "Johanna Bjorklund and Loek Cleophas and M. Karlsson",
year = "2017",
doi = "10.3217/jucs-023-11-1019",
language = "English",
volume = "23",
pages = "1019--1034",
journal = "Journal of Universal Computer Science",
issn = "0948-6912",
publisher = "Springer",
number = "11",

}

An evaluation of structured language modeling for automatic speech recognition. / Bjorklund, Johanna; Cleophas, Loek; Karlsson, M.

In: Journal of Universal Computer Science, Vol. 23, No. 11, 2017, p. 1019-1034.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - An evaluation of structured language modeling for automatic speech recognition

AU - Bjorklund, Johanna

AU - Cleophas, Loek

AU - Karlsson, M.

PY - 2017

Y1 - 2017

N2 - We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.

AB - We evaluated probabilistic lexicalized tree-insertion grammars (PLTIGs) on a classification task relevant for automatic speech recognition. The baseline is a family of n-gram models tuned with Witten-Bell smoothing. The language models are trained on unannotated corpora, consisting of 10,000 to 50,000 sentences collected from the English section of Wikipedia. For the evaluation, an additional 150 random sentences were selected from the same source, and for each of these, approximately 3,200 variations were generated. Each variant sentence was obtained by replacing an arbitrary word by a similar word, chosen to be at most 2 character edits from the original. The evaluation task consisted of identifying the original sentence among the automatically constructed (and typically inferior) alternatives. In the experiments, the n-gram models outperformed the PLTIG model on the smaller data set, but as the size of data grew, the PLTIG model gave comparable results. While PLTIGs are more demanding to train, they have the advantage that they assign a parse structure to their input sentences. This is valuable for continued algorithmic processing, for example, for summarization or sentiment analysis.

KW - language modeling

KW - automatic speech recognition

KW - probabilistic lexicalized tree-insertion grammars

U2 - 10.3217/jucs-023-11-1019

DO - 10.3217/jucs-023-11-1019

M3 - Article

VL - 23

SP - 1019

EP - 1034

JO - Journal of Universal Computer Science

JF - Journal of Universal Computer Science

SN - 0948-6912

IS - 11

ER -