TY - JOUR
T1 - Why rankings of biomedical image analysis competitions should be interpreted with care
AU - Maier-hein, Lena
AU - Eisenmann, Matthias
AU - Reinke, Annika
AU - Onogur, Sinan
AU - Stankovic, Marko
AU - Scholz, Patrick
AU - Arbel, Tal
AU - Bogunovic, Hrvoje
AU - Bradley, Andrew P.
AU - Carass, Aaron
AU - Feldmann, Carolin
AU - Frangi, Alejandro F.
AU - Full, Peter M.
AU - van Ginneken, Bram
AU - Hanbury, Allan
AU - Honauer, Katrin
AU - Kozubek, Michal
AU - März, Keno
AU - Maier, Oskar
AU - Maier-Hein, Klaus
AU - Menze, Bjoern H.
AU - Müller, Henning
AU - Neher, Peter F.
AU - Niessen, Wiro
AU - Rajpoot, Nasir
AU - Sharp, Gregory C.
AU - Sirinukunwattana, Korsuk
AU - Speidel, Stefanie
AU - Stock, Christian
AU - Stoyanov, Danail
AU - Taha, Abdel Aziz
AU - van der Sommen, Fons
AU - Wang, Ching-wei
AU - Weber, Marc-André
AU - Zheng, Guoyan
AU - Jannin, Pierre
AU - Kopp-schneider, Annette
PY - 2018/12/1
Y1 - 2018/12/1
N2 - International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.
AB - International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.
KW - Biomedical Research/methods
KW - Biomedical Technology/classification
KW - Diagnostic Imaging/classification
KW - Humans
KW - Image Processing, Computer-Assisted/methods
KW - Reproducibility of Results
KW - Surveys and Questionnaires
KW - Technology Assessment, Biomedical/methods
UR - http://www.scopus.com/inward/record.url?scp=85058062868&partnerID=8YFLogxK
U2 - 10.1038/s41467-018-07619-7
DO - 10.1038/s41467-018-07619-7
M3 - Article
C2 - 30523263
SN - 2041-1723
VL - 9
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 5217
ER -