Abstract
Gastrointestinal (GI) cancers remain a major global health burden, with esophageal adenocarcinoma (EAC) and colorectal cancer (CRC) posing particular concern due to their low five-year survival rates and often late-stage diagnosis. These types of cancer are commonly preceded by identifiable precursor lesions, i.e. Barrett's esophagus (BE) for EAC and colorectal polyps (CRPs) for CRC, which significantly increase the risk of progression to cancer. Early detection of neoplastic changes within these lesions is critical, since it facilitates timely and often minimally invasive interventions that can greatly improve patient outcomes and survival rates.
Despite the success and advances of deep learning-based Computer-Aided Detection/Diagnosis (CADe/CADx, collectively referred to as CAD), their widespread clinical adoption remains limited. Key barriers remain insufficiently explored and include a lack of robustness to domain shifts caused by variability in endoscopic imaging conditions, as well as the technical and operational demands of real-time integration into clinical workflows.
The research in this thesis investigates multiple aspects of CAD system performance, with a particular emphasis on factors critical to successful clinical adoption. To this end, the research work is divided into four parts. First, the inherent suitability of foundational deep learning-based network architectures is assessed across diverse GI endoscopic applications, focusing on various performance aspects related to endoscopic imaging variability. Second, the impact of domain shifts caused by endoscopy-specific postprocessing enhancement settings on CAD system performance consistency is examined, alongside an exploration of a potential strategy to improve robustness to this issue. Third, a CADe system for BE neoplasia is systematically redesigned, developed, and rigorously evaluated, with specific optimization for handling endoscopic data heterogeneity. Finally, a video-based CADx model is improved for computational efficiency, temporal stability, and general clinical suitability for real-time decision support.
The research begins with a systematic investigation and comparative evaluation of deep learning architectural families, providing insights for informed architecture selection in the development of robust CAD systems. Chapter 3 presents a comprehensive benchmarking study of state-of-the-art architectures from the two most widely used families in medical image analysis, i.e. Convolutional Neural Networks (CNNs) and Transformers, across three representative GI endoscopic tasks. The evaluation considers peak performance, robustness to image quality variations, and generalization to unseen data domains, while also assessing the impact of training data volume. The results show that Transformers perform comparably to CNNs in all evaluated criteria, establishing them as a viable alternative to CNNs for GI endoscopic CAD applications. Building on this, Chapter 5 expands the architectural evaluation in the context of a CADe system for BE neoplasia detection. This includes CNNs, Transformers, hybrid CNN-Transformer architectures, and State-Space Models (SSMs). The results demonstrate that hybrid architectures, leveraging the complementary strengths of CNNs and Transformers, outperform the other architectures in both performance and robustness with on average 1.3% and 1.0%, respectively, using dedicated validation sets.
The second part of the research aims to better understand how domain shifts caused by varying imaging conditions affect architectural performance. To this end, Chapter 4 presents a systematic evaluation of the impact of postprocessing enhancement settings offered by modern endoscopy equipment, on the performance consistency of two CAD systems. These systems are CADe for BE neoplasia and CADx for CRPs. The results demonstrate that variations in the enhancement settings can lead to substantial performance fluctuations, up to 25% for CADe sensitivity and 30% for CADx specificity. To address this issue, the chapter proposes a mitigation strategy using image enhancement-based data augmentation. This strategy increases the diversity of enhancement settings in training and validation data. This establishes performance stability specifically for this issue while outperforming standard augmentation. For the CADe system, the sensitivity and specificity fluctuations decrease from 9% and 7% to 2% and 1%, respectively, while the CADx system shows comparable reductions going from 7% and 18% to 2% and 8%.
The third part of the research, presented in Chapters 5 and 6, details the structured redesign, development, and rigorous evaluation of a next-generation CADe system for BE neoplasia, compared to the state-of-the-art at the start of this thesis research. This system is specifically optimized for robustness to endoscopic data heterogeneity and is informed by the insights gained in preceding chapters. Chapter 5 systematically refines key design considerations, including architectural choices, training strategies, and inference approaches. The analysis identifies self-supervised domain-specific pretraining and the adoption of a hybrid CNN-Transformer architecture as dominant contributing factors to improved performance and robustness. Incorporating these factors in combination with wide diversity of training data, results in incremental performance gains of up to 7.8% in dedicated validation sets. The final optimized model demonstrates significant performance and robustness gains over state-of-the-art systems across multiple independent test sets. These sets mimic real-world imaging conditions, where the optimized model shows improvements of up to 12.8% for classification and localization tasks. Building on these advancements and exploiting an expanded and more diverse dataset, Chapter 6 presents the development of the fully realized next-generation CADe system. This system is subjected to a comprehensive ex-vivo benchmarking evaluation under imaging conditions including routine clinical variability. Compared to its embedded predecessor designed for a leading manufacturer of endoscopy equipment, the next-generation model demonstrates substantially improved robustness and performance, achieving significant gains of up to 16.3% in classification and 19.3% in localization, using various extensive and novel test sets.
The fourth and last part of this thesis in Chapter 7 explores the development and validation of a lightweight narrow-band imaging-based CADx system for video-based characterization of BE neoplasia. This system is specifically designed to facilitate as a complement to a white-light endoscopy-based CADe system used for initial lesion detection, as developed in Chapter 6. First, foundational methods for real-time video classification are established, emphasizing the need for robust and efficient architectures. Second, a lightweight int8-quantized neural network with a modest size of 4.8 MB is introduced, incorporating a temporal stability mechanism to improve prediction consistency across video frames. This design enables accurate inference with minimal computational overhead, thereby facilitating deployment on edge devices or potential integration as an embedded system within existing endoscopy systems. Third, a clinical performance benchmarking study shows that the CADx system significantly outperforms 44 general endoscopists in sensitivity. In addition, CADx assistance improves the sensitivity and specificity of general endoscopists by 12% and 8%, respectively, while reducing diagnostic uncertainty by 41% compared to unassisted assessment, bringing their performance to the level of Barrett's experts.
In conclusion, the work of this thesis makes substantial contributions to the design, development, and evaluation of robust and effective CAD systems for early cancer detection/characterization in GI endoscopic imaging, with a focus on enabling seamless and successful clinical integration. The research in this thesis has systematically addressed challenges posed by domain shifts due to endoscopy-specific imaging variability and has studied their impact on deep learning-based network architectures. This work establishes a foundational transferable design framework for developing robust and effective CAD systems, based on CNNs or Transformers and their combination. Especially the hybrid approach of CNNs and Transformers forms an interesting novel category for future research giving attractive performance. In addition, rigorous evaluation strategies and benchmarking studies on CAD-assisted endoscopy confirm the effectiveness of the developed framework, which leads to CAD systems that improve diagnostic accuracy, reliability, and efficiency. Collectively, these contributions advance the field toward the safe, scalable, and routine clinical adoption of GI endoscopic CAD systems.
| Original language | English |
|---|---|
| Qualification | Doctor of Philosophy |
| Awarding Institution |
|
| Supervisors/Advisors |
|
| Award date | 11 Nov 2025 |
| Place of Publication | Eindhoven |
| Publisher | |
| Print ISBNs | 978-90-386-6509-2 |
| Publication status | Published - 11 Nov 2025 |