Keyword spotting using time-domain features in a temporal convolutional network

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

1 Downloads (Pure)

Samenvatting

With the increasing demand on voice recognition services, more attention is paid to simpler algorithms that are capable to run locally on a hardware device. This paper demonstrates simpler speech features derived in the time-domain for Keyword Spotting (KWS). The features are considered as constrained lag autocorrelations computed on overlapped speech frames to form a 2D map. We refer to this as Multi-Frame Shifted Time Similarity (MFSTS). MFSTS performance is compared against the widely known Mel-Frequency Cepstral Coefficients (MFCC) that are computed in the frequency-domain. A Temporal Convolutional Network (TCN) is designed to classify keywords using both MFCC and MFSTS. This is done by employing an open source dataset from Google Brain, containing ~ 106000 files of one-second recorded words such as, 'Backward', 'Forward', 'Stop' etc. Initial findings show that MFSTS can be used for KWS tasks without visiting the frequency-domain. Our experimental results show that classification of the whole dataset (25 classes) based on MFCC and MFSTS are in a very good agreement. We compare the performance of the TCNbased classifier with other related work in the literature. The classification is performed using small memory footprint (~ 90 KB) and low compute power (~ 5 MOPs) per inference. The achieved classification accuracies are 93.4% using MFCC and 91.2% using MFSTS. Furthermore, a case study is provided for a single-keyword spotting task. The case study demonstrates how MFSTS can be used as a simple preprocessing scheme with small classifiers while achieving as high as 98% accuracy. The compute simplicity of MFSTS makes it attractive for low power KWS applications paving the way for resource-aware solutions.

Originele taal-2Engels
TitelProceedings - Euromicro Conference on Digital System Design, DSD 2019
RedacteurenNikos Konofaos, Paris Kitsos
Plaats van productiePiscataway
UitgeverijInstitute of Electrical and Electronics Engineers
Pagina's313-319
Aantal pagina's7
ISBN van elektronische versie978-1-7281-2862-7
DOI's
StatusGepubliceerd - aug 2019
EvenementEuromicro Conference on Digital System Design - Kallithea, Chalkidiki, Griekenland
Duur: 28 aug 201930 aug 2019
Congresnummer: 22
http://dsd-seaa2019.csd.auth.gr/

Congres

CongresEuromicro Conference on Digital System Design
Verkorte titelDSD 2019
LandGriekenland
StadChalkidiki
Periode28/08/1930/08/19
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Keyword spotting using time-domain features in a temporal convolutional network'. Samen vormen ze een unieke vingerafdruk.

Citeer dit