Recent emotion recognition models are rather successful in recognizing instantaneous emotional expressions. However, when applied to continuous interactions, they show a weaker adaptation to a person-specific and long-term emotion appraisal. In this paper, we present an unsupervised neural framework that improves emotion recognition by learning how to describe the continuous affective behavior of individual persons. Our framework is composed of three self-organizing mechanisms: (1) a recurrent growing layer to cluster general emotion expressions, (2) a set of associative layers, acting as affective memories to model specific emotional behavior of individuals, (3) and an online learning layer which provides contextual modeling of continuous expressions. We propose different learning strategies to integrate all three mechanisms and to improve the performance of arousal and valence recognition of the OMG-Emotion dataset. We evaluate our model with a series of experiments ranging from ablation studies assessing the different contributions of each neural component to an objective comparison with state-of-the-art solutions. The results from the evaluations show a good performance on emotion recognition of continuous emotions on monologue videos. Furthermore, we discuss how the model self-regulates the interplay between generalized and personalized emotion perception and how this influences the model's reliability when recognizing unseen emotion expressions.