Abstract
With the increasing demand on voice recognition services, more attention is paid to simpler algorithms that are capable to run locally on a hardware device. This paper demonstrates simpler speech features derived in the time-domain for Keyword Spotting (KWS). The features are considered as constrained lag autocorrelations computed on overlapped speech frames to form a 2D map. We refer to this as Multi-Frame Shifted Time Similarity (MFSTS). MFSTS performance is compared against the widely known Mel-Frequency Cepstral Coefficients (MFCC) that are computed in the frequency-domain. A Temporal Convolutional Network (TCN) is designed to classify keywords using both MFCC and MFSTS. This is done by employing an open source dataset from Google Brain, containing ~ 106000 files of one-second recorded words such as, 'Backward', 'Forward', 'Stop' etc. Initial findings show that MFSTS can be used for KWS tasks without visiting the frequency-domain. Our experimental results show that classification of the whole dataset (25 classes) based on MFCC and MFSTS are in a very good agreement. We compare the performance of the TCNbased classifier with other related work in the literature. The classification is performed using small memory footprint (~ 90 KB) and low compute power (~ 5 MOPs) per inference. The achieved classification accuracies are 93.4% using MFCC and 91.2% using MFSTS. Furthermore, a case study is provided for a single-keyword spotting task. The case study demonstrates how MFSTS can be used as a simple preprocessing scheme with small classifiers while achieving as high as 98% accuracy. The compute simplicity of MFSTS makes it attractive for low power KWS applications paving the way for resource-aware solutions.
Original language | English |
---|---|
Title of host publication | Proceedings - Euromicro Conference on Digital System Design, DSD 2019 |
Editors | Nikos Konofaos, Paris Kitsos |
Place of Publication | Piscataway |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 313-319 |
Number of pages | 7 |
ISBN (Electronic) | 978-1-7281-2862-7 |
DOIs | |
Publication status | Published - Aug 2019 |
Event | 22nd Euromicro Conference on Digital System Design, DSD 2019 - Kallithea, Kallithea, Chalkidiki, Greece Duration: 28 Aug 2019 → 30 Aug 2019 Conference number: 22 http://dsd-seaa2019.csd.auth.gr/ |
Conference
Conference | 22nd Euromicro Conference on Digital System Design, DSD 2019 |
---|---|
Abbreviated title | DSD 2019 |
Country/Territory | Greece |
City | Kallithea, Chalkidiki |
Period | 28/08/19 → 30/08/19 |
Internet address |
Funding
This research has received funding from the Electronic Component Systems for European Leadership Joint Undertaking under grant agreement No 737487. This Joint Undertaking receives support from the European Union's Horizon 2020 research and innovation program.
Keywords
- Autocorrelation
- MFCC
- Speech Recognition
- Spotting (KWS)
- Temporal Convolutional Network (TCN)