Abstract
In this paper, we propose a robust audio-oriented learning strategies to address the issue of character recognition in movie/TV-series. Identifying major characters in movies/TV-series has drawn researcher's great interests. Most of them have explored some character recognition and retrieval applications based on visual appearance, whereas visual appearance is inconsistent throughout the whole video. Our approach, mainly focusing on audio, features that: (i) we extract both spectral and temporal audio features of Mel-scale Frequency Cepstral Coefficients(MFCC), prosodic, average pause length, speaking rate features, pitch and short time energy, and also the complementarity of Gabor features, (ii) we adopt Multi-Task Joint Sparse Representation and Recognition (MTJSRC) model for learning with all the features except Gabor, and SVM model with Gabor features, (iii) regarding these original features as seeds, we extend the training set from talk shows with semi-supervise learning, (iv) the Conditional Random Field (CRF) model with consideration of the constrains in time sequence is introduced to enhance the final labelling. Finally, experimental results demonstrates the effectiveness performance of our approach.
Original language | English |
---|---|
Title of host publication | Proceedings - 2016 International Conference on Virtual Reality and Visualization, ICVRV 2016 |
Editors | Dandan Ding, Dangxiao Wang, Jian Chen, Xun Luo |
Place of Publication | Piscataway |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 459-464 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5090-5188-5 |
ISBN (Print) | 978-1-5090-5189-2 |
DOIs | |
Publication status | Published - 2016 |
Event | 6th International Conference on Virtual Reality and Visualization (ICVRV 2016) - Hangzhou, Zhejiang, China Duration: 24 Sept 2016 → 26 Sept 2016 Conference number: 6 |
Conference
Conference | 6th International Conference on Virtual Reality and Visualization (ICVRV 2016) |
---|---|
Abbreviated title | ICVRV 2016 |
Country/Territory | China |
City | Hangzhou, Zhejiang |
Period | 24/09/16 → 26/09/16 |
Funding
National Natural Science Foundation of China under Grant NO. 61401023
Keywords
- Character recognition
- Conditional Random Field
- MFCC
- Sparse Representation
- Support Vector Machine