Humans are experts in cooperating with each other when trying to accomplish tasks they cannot achieve alone. Recent studies of joint action have shown that when performing tasks together people strongly rely on the neurocognitive mechanisms that they also use when performing actions individually, that is, they predict the consequences of their co-actor's behavior through internal action simulation. Context-sensitive action monitoring and action selection processes, however, are relatively underrated but crucial ingredients of joint action. In the present paper, we try to correct the somewhat simplified view on joint action by reviewing recent studies of joint action simulation, monitoring, and selection while emphasizing the intricate interrelationships between these processes. We complement our review by defining the contours of a neurologically plausible computational framework of joint action.