Tens of images can suffice to train neural networks for malignant leukocyte detection

Jens P.E. Schouten, Christian Matek, Luuk F.P. Jacobs, Michèle C. Buck, Dragan Bošnački (Corresponding author), Carsten Marr (Corresponding author)

Research output: Contribution to journalArticleAcademicpeer-review

24 Citations (Scopus)
34 Downloads (Pure)

Abstract

Convolutional neural networks (CNNs) excel as powerful tools for biomedical image classification. It is commonly assumed that training CNNs requires large amounts of annotated data. This is a bottleneck in many medical applications where annotation relies on expert knowledge. Here, we analyze the binary classification performance of a CNN on two independent cytomorphology datasets as a function of training set size. Specifically, we train a sequential model to discriminate non-malignant leukocytes from blast cells, whose appearance in the peripheral blood is a hallmark of leukemia. We systematically vary training set size, finding that tens of training images suffice for a binary classification with an ROC-AUC over 90%. Saliency maps and layer-wise relevance propagation visualizations suggest that the network learns to increasingly focus on nuclear structures of leukocytes as the number of training images is increased. A low dimensional tSNE representation reveals that while the two classes are separated already for a few training images, the distinction between the classes becomes clearer when more training images are used. To evaluate the performance in a multi-class problem, we annotated single-cell images from a acute lymphoblastic leukemia dataset into six different hematopoietic classes. Multi-class prediction suggests that also here few single-cell images suffice if differences between morphological classes are large enough. The incorporation of deep learning algorithms into clinical practice has the potential to reduce variability and cost, democratize usage of expertise, and allow for early detection of disease onset and relapse. Our approach evaluates the performance of a deep learning based cytology classifier with respect to size and complexity of the training data and the classification task.

Original languageEnglish
Article number7995
Number of pages8
JournalScientific Reports
Volume11
Issue number1
DOIs
Publication statusPublished - 12 Apr 2021

Funding

We thank Natal van Riel and Katharina Goetze for supporting this study; Nikos Chlis for feedback on our CNN approach and Max Schmidt for feedback on the presentation; Labati et al. for providing the single-cell data. This work was supported by the German Science Foundation DFG within the Collaborative Research Center SFB 1243 “Cancer Evolution” with a research grant for MB. ChM gratefully acknowledges support from Deutsche Jose Carreras-Leukämie Stiftung. CM has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 866411).

FundersFunder number
German Research FoundationSFB 1243
European Union's Horizon 2020 - Research and Innovation Framework Programme866411
H2020 European Research Council
Deutsche Forschungsgemeinschaft

    Keywords

    • Databases as Topic
    • Humans
    • Image Processing, Computer-Assisted
    • Leukocytes/pathology
    • Lymphocytes/pathology
    • Neural Networks, Computer

    Fingerprint

    Dive into the research topics of 'Tens of images can suffice to train neural networks for malignant leukocyte detection'. Together they form a unique fingerprint.

    Cite this