Routine surveillance endoscopies are currently used to detect dysplasia in patient with Barrett's Esophagus (BE). However, most of these procedures are performed by non-expert endoscopists in community hospitals. Leading to many missed dysplastic lesions, which can progress into advanced esophageal adenocarcinoma if left untreated.1 In recent years, several successful algorithms have been proposed for the detection of cancer in BE using high-quality overview images. This work addresses the first steps towards clinical application on endoscopic surveillance videos. Several challenges are identified that occur when moving from image-based to video-based analysis. (1) It is shown that algorithms trained on high-quality overview images do not naively transfer to endoscopic videos due to e.g. non-informative frames. (2) Video quality is shown to be an important factor in algorithm performance. Specifically, temporal location performance is highly correlated with video quality. (3) When moving to real-time algorithms, the additional compute necessary to address the challenges in videos will become a burden on the computational budget. However, in addition to challenges, videos also bring new opportunities not available in the current image-based methods such as the inclusion of temporal information. This work shows that a multi-frame approach increases performance compared to a naive single-image method when the above challenges are addressed.