End-to-end listening agent for audiovisual emotional and naturalistic interactions

Kevin El Haddad, Yara Rizk, Louise Heron, Nadine Hajj, Yong Zhao, Jaebok Kim, Trung Ngô Trọng, M. Lee, Marwan Doumit, Payton Lin, Yelin Kim, Hüseyin Çakmak

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

Uittreksel

In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.
TaalEngels
Pagina's49-61
Aantal pagina's14
TijdschriftJournal of Science and Technology of the Arts
Volume10
Nummer van het tijdschrift2
DOI's
StatusGepubliceerd - 8 nov 2018

Vingerafdruk

Data recording
Deep learning

Trefwoorden

    Citeer dit

    El Haddad, K., Rizk, Y., Heron, L., Hajj, N., Zhao, Y., Kim, J., ... Çakmak, H. (2018). End-to-end listening agent for audiovisual emotional and naturalistic interactions. Journal of Science and Technology of the Arts, 10(2), 49-61. DOI: 10.7559/citarj.v10i2.424
    El Haddad, Kevin ; Rizk, Yara ; Heron, Louise ; Hajj, Nadine ; Zhao, Yong ; Kim, Jaebok ; Ngô Trọng, Trung ; Lee, M. ; Doumit, Marwan ; Lin, Payton ; Kim, Yelin ; Çakmak, Hüseyin. / End-to-end listening agent for audiovisual emotional and naturalistic interactions. In: Journal of Science and Technology of the Arts. 2018 ; Vol. 10, Nr. 2. blz. 49-61
    @article{6781b40f109843d2872d940605af9fe7,
    title = "End-to-end listening agent for audiovisual emotional and naturalistic interactions",
    abstract = "In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.",
    keywords = "Listening Agent, Smile, laughter, head movement, speech emotion recognition, Non-verbal communication, Multimodal synthesis, Nonverbal Expression Detection, Non-verbal expression synthesis",
    author = "{El Haddad}, Kevin and Yara Rizk and Louise Heron and Nadine Hajj and Yong Zhao and Jaebok Kim and {Ng{\^o} Trọng}, Trung and M. Lee and Marwan Doumit and Payton Lin and Yelin Kim and H{\"u}seyin {\cC}akmak",
    year = "2018",
    month = "11",
    day = "8",
    doi = "10.7559/citarj.v10i2.424",
    language = "English",
    volume = "10",
    pages = "49--61",
    journal = "Journal of Science and Technology of the Arts",
    issn = "1646-9798",
    publisher = "Portuguese Catholic University",
    number = "2",

    }

    El Haddad, K, Rizk, Y, Heron, L, Hajj, N, Zhao, Y, Kim, J, Ngô Trọng, T, Lee, M, Doumit, M, Lin, P, Kim, Y & Çakmak, H 2018, 'End-to-end listening agent for audiovisual emotional and naturalistic interactions' Journal of Science and Technology of the Arts, vol. 10, nr. 2, blz. 49-61. DOI: 10.7559/citarj.v10i2.424

    End-to-end listening agent for audiovisual emotional and naturalistic interactions. / El Haddad, Kevin; Rizk, Yara; Heron, Louise; Hajj, Nadine; Zhao, Yong; Kim, Jaebok; Ngô Trọng, Trung; Lee, M.; Doumit, Marwan; Lin, Payton; Kim, Yelin; Çakmak, Hüseyin.

    In: Journal of Science and Technology of the Arts, Vol. 10, Nr. 2, 08.11.2018, blz. 49-61.

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    TY - JOUR

    T1 - End-to-end listening agent for audiovisual emotional and naturalistic interactions

    AU - El Haddad,Kevin

    AU - Rizk,Yara

    AU - Heron,Louise

    AU - Hajj,Nadine

    AU - Zhao,Yong

    AU - Kim,Jaebok

    AU - Ngô Trọng,Trung

    AU - Lee,M.

    AU - Doumit,Marwan

    AU - Lin,Payton

    AU - Kim,Yelin

    AU - Çakmak,Hüseyin

    PY - 2018/11/8

    Y1 - 2018/11/8

    N2 - In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.

    AB - In this work, we established the foundations of a framework with the goal to build an end-to-end naturalistic expressive listening agent. The project was split into modules for recognition of the user’s paralinguistic and nonverbal expressions, prediction of the agent’s reactions, synthesis of the agent’s expressions and data recordings of nonverbal conversation expressions. First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system. Then several sequence prediction systems for nonverbal expressions were implemented and compared. Also, an audiovisual concatenation-based synthesis system was implemented. Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.

    KW - Listening Agent

    KW - Smile

    KW - laughter

    KW - head movement

    KW - speech emotion recognition

    KW - Non-verbal communication

    KW - Multimodal synthesis

    KW - Nonverbal Expression Detection

    KW - Non-verbal expression synthesis

    U2 - 10.7559/citarj.v10i2.424

    DO - 10.7559/citarj.v10i2.424

    M3 - Article

    VL - 10

    SP - 49

    EP - 61

    JO - Journal of Science and Technology of the Arts

    T2 - Journal of Science and Technology of the Arts

    JF - Journal of Science and Technology of the Arts

    SN - 1646-9798

    IS - 2

    ER -

    El Haddad K, Rizk Y, Heron L, Hajj N, Zhao Y, Kim J et al. End-to-end listening agent for audiovisual emotional and naturalistic interactions. Journal of Science and Technology of the Arts. 2018 nov 8;10(2):49-61. Beschikbaar vanaf, DOI: 10.7559/citarj.v10i2.424