A novel audio-oriented learning strategies for character recognition

Changbin Lu, Guangyu Gao

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.

Original languageEnglish
Title of host publicationProceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016
EditorsDandan Ding, Dangxiao Wang, Jian Chen, Xun Luo
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Pages459-464
Number of pages6
ISBN (Electronic)978-1-5090-5188-5
ISBN (Print)978-1-5090-5189-2
DOIs
Publication statusPublished - 2016
Event6th International Conference on Virtual Reality and Visualization (ICVRV 2016) - Hangzhou, Zhejiang, China
Duration: 24 Sept 201626 Sept 2016
Conference number: 6

Conference

Conference6th International Conference on Virtual Reality and Visualization (ICVRV 2016)
Abbreviated titleICVRV 2016
Country/TerritoryChina
CityHangzhou, Zhejiang
Period24/09/1626/09/16

Funding

National Natural Science Foundation of China under Grant NO. 61401023

Keywords

  • Character recognition
  • Conditional Random Field
  • MFCC
  • Sparse Representation
  • Support Vector Machine

Fingerprint

Dive into the research topics of 'A novel audio-oriented learning strategies for character recognition'. Together they form a unique fingerprint.

Cite this