Interpretable semisupervised classifier for predicting cancer stages

Isel Grau, Dipankar Sengupta, Ann Nowe

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

2 Citations (Scopus)


Machine learning techniques in medicine have been at the forefront addressing challenges such as diagnosis, prognosis prediction, or precision medicine. In this field, the data are sometimes abundant but comes from different data sources or lack assigned labels. The process of manually labeling these data when conforming to a curated dataset for supervised classification can be costly. Semisupervised classification offers a wide range of methods for leveraging unlabeled data when learning prediction models. However, these classifiers are commonly deep or ensemble learning structures that often result in black boxes. The requirement of interpretable models for medical settings led us to propose the self-labeling gray box classifier, which outperforms other semisupervised classifiers on benchmarking datasets while providing interpretability. In this chapter, we illustrate the applications of the self-labeling gray box on the omics and clinical datasets from the cancer genome atlas. We show that the self-labeling gray box is accurate in predicting cancer stages of rare cancers by leveraging the unlabeled instances from more common cancer types. We discuss insights, the features influencing prediction, and a global representation of the knowledge through decision trees or rule lists, which can aid clinicians and researchers.

Original languageEnglish
Title of host publicationMachine Learning, Big Data, and IoT for Medical Informatics
Subtitle of host publicationIntelligent Data-Centric Systems
Number of pages19
ISBN (Electronic)9780128217771
ISBN (Print)978-0-12-821777-1
Publication statusPublished - Jun 2021
Externally publishedYes


  • Cancer stage prediction
  • Explainable artificial intelligence
  • Gray box model
  • Self-labeling
  • Semisupervised classifier


Dive into the research topics of 'Interpretable semisupervised classifier for predicting cancer stages'. Together they form a unique fingerprint.

Cite this