Abstract
With the increasing demand on voice recognition services, more attention is paid to simpler algorithms that are capable to run locally on a hardware device. This paper demonstrates simpler speech features derived in the time-domain for Keyword Spotting (KWS). The features are considered as constrained lag autocorrelations computed on overlapped speech frames to form a 2D map. We refer to this as Multi-Frame Shifted Time Similarity (MFSTS). MFSTS performance is compared against the widely known Mel-Frequency Cepstral Coefficients (MFCC) that are computed in the frequency-domain. A Temporal Convolutional Network (TCN) is designed to classify keywords using both MFCC and MFSTS. This is done by employing an open source dataset from Google Brain, containing ~ 106000 files of one-second recorded words such as, 'Backward', 'Forward', 'Stop' etc. Initial findings show that MFSTS can be used for KWS tasks without visiting the frequency-domain. Our experimental results show that classification of the whole dataset (25 classes) based on MFCC and MFSTS are in a very good agreement. We compare the performance of the TCNbased classifier with other related work in the literature. The classification is performed using small memory footprint (~ 90 KB) and low compute power (~ 5 MOPs) per inference. The achieved classification accuracies are 93.4% using MFCC and 91.2% using MFSTS. Furthermore, a case study is provided for a single-keyword spotting task. The case study demonstrates how MFSTS can be used as a simple preprocessing scheme with small classifiers while achieving as high as 98% accuracy. The compute simplicity of MFSTS makes it attractive for low power KWS applications paving the way for resource-aware solutions.
Original language | English |
---|---|
Title of host publication | Proceedings - Euromicro Conference on Digital System Design, DSD 2019 |
Editors | Nikos Konofaos, Paris Kitsos |
Place of Publication | Piscataway |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 313-319 |
Number of pages | 7 |
ISBN (Electronic) | 978-1-7281-2862-7 |
DOIs | |
Publication status | Published - Aug 2019 |
Event | 22nd Euromicro Conference on Digital System Design, DSD 2019 - Kallithea, Kallithea, Chalkidiki, Greece Duration: 28 Aug 2019 → 30 Aug 2019 Conference number: 22 http://dsd-seaa2019.csd.auth.gr/ |
Conference
Conference | 22nd Euromicro Conference on Digital System Design, DSD 2019 |
---|---|
Abbreviated title | DSD 2019 |
Country/Territory | Greece |
City | Kallithea, Chalkidiki |
Period | 28/08/19 → 30/08/19 |
Internet address |
Keywords
- Autocorrelation
- MFCC
- Speech Recognition
- Spotting (KWS)
- Temporal Convolutional Network (TCN)