Active Learning of non-Semantic Speech Tasks with Pretrained models

Harlin Lee, Aaqib Saeed, Andrea L. Bertozzi

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    1 Citation (Scopus)

    Abstract

    Pretraining neural networks with massive unlabeled datasets has become popular as it equips the deep models with a better prior to solve downstream tasks. However, this approach generally assumes that the downstream tasks have access to annotated data of sufficient size. In this work, we propose ALOE, a novel system for improving the data- and label-efficiency of non-semantic speech tasks with active learning (AL). ALOE uses pretrained models in conjunction with active learning to label data incrementally and learn classifiers for downstream tasks, thereby mitigating the need to acquire labeled data beforehand. We demonstrate the effectiveness of ALOE on a wide range of tasks, uncertainty-based acquisition functions, and model architectures. Training a linear classifier on top of a frozen encoder with ALOE is shown to achieve performance similar to several baselines that utilize the entire labeled data.

    Original languageEnglish
    Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    PublisherIEEE/LEOS
    Number of pages5
    ISBN (Electronic)978-1-7281-6327-7
    ISBN (Print)978-1-7281-6328-4
    DOIs
    Publication statusPublished - 10 Jun 2023
    Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
    Duration: 4 Jun 202310 Jun 2023

    Conference

    Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
    Abbreviated titleICASSP 2023
    Country/TerritoryGreece
    CityRhodes Island
    Period4/06/2310/06/23

    Funding

    This work was partially supported by NSF grants DMS-1952339 and DMS-2152717. It was also partially supported by NGA award HM0476- 21-1-0003, approved for public release, NGA-U-2022-02425

    FundersFunder number
    National Science FoundationDMS-1952339, HM0476- 21-1-0003, NGA-U-2022-02425, DMS-2152717

      Keywords

      • Training
      • Neural networks
      • Signal processing
      • Data models
      • Acoustics
      • Task analysis
      • Speech processing
      • transfer learning
      • active learning
      • self-supervised learning
      • audio
      • non-semantic speech

      Fingerprint

      Dive into the research topics of 'Active Learning of non-Semantic Speech Tasks with Pretrained models'. Together they form a unique fingerprint.

      Cite this