Virtual reality applications with virtual humans, such as virtual reality exposure therapy, health coaches and negotiation simulators, are developed for different contexts and usually for users from different countries. The emphasis on a virtual human's emotional expression depends on the application; some virtual reality applications need an emotional expression of the virtual human during the speaking phase, some during the listening phase and some during both speaking and listening phases. Although studies have investigated how humans perceive a virtual human's emotion during each phase separately, few studies carried out a parallel comparison between the two phases. This study aims to fill this gap, and on top of that, includes an investigation of the cultural interpretation of the virtual human's emotion, especially with respect to the emotion's valence. The experiment was conducted with both Chinese and non-Chinese participants. These participants were asked to rate the valence of seven different emotional expressions (ranging from negative to neutral to positive during speaking and listening) of a Chinese virtual lady. The results showed that there was a high correlation in valence rating between both groups of participants, which indicated that the valence of the emotional expressions was as easily recognized by people from a different cultural background as the virtual human. In addition, participants tended to perceive the virtual human's expressed valence as more intense in the speaking phase than in the listening phase. The additional vocal emotional expression in the speaking phase is put forward as a likely cause for this phenomenon. © 2013 Springer-Verlag London.