Introduction: The Racine scale is a 5-point seizure behavior scoring paradigm used in the amygdala kindled rat. Though this scale has been applied widely in experimental epilepsy research, studies of reproducibility are rare. The aim of the current study was, therefore, to assess its interobserver variability and intraobserver variability. Material and methods: A video database set was acquired in the course of amygdala kindling of 67 Wistar rats. Six blinded observers received scoring instructions and then viewed a set of 15 random videos (session #1). Next, each observer scored 379 to 1048 additional videos (session #2) and finally scored the same set of 15 videos again (session #3). Scores included the occurrence of seizures (yes or no), the total seizure time (start of stimulus until the absence of seizure behavior), and the highest Racine stage. Interobserver variability and intraobserver variability were assessed in and between sessions #1 and #3 using a 2-way mixed intraclass correlation or Cohen's kappa depending on the variable. Results: Interobserver agreement in session #1 was 0.664 for seizure occurrence, 0.861 for total seizure time, and 0.797 for the highest Racine stage. In session #3, interobserver agreement on seizure occurrence declined to 0.492, total seizure time declined to 0.625, and agreement for the highest Racine stage was 0.725. Interobserver agreement was scored insufficiently on focal R2 seizures in both sessions (0.287 and 0.182). Intraobserver agreement reached >. 0.80 agreement for seizure occurrence, highest seizure score, and total seizure time in 3 out of 4 observers. Racine's scale stage 2 seizure scores were only 0.135 in one observer but 0.650, 0.810, and 0.635 in the other observers. Discussion and conclusion: Overall, interobserver agreement and intraobserver agreement in scoring with Racine's scale were adequate. However, because interobserver agreement declined after a period of individually scoring videos, we suggest periodic repetition of the standardized instruction in the course of evaluating videos in order to ensure reproducible results.