TY - JOUR
T1 - Appropriate trust in artificial intelligence for the optical diagnosis of colorectal polyps
T2 - The role of human/artificial intelligence interaction
AU - van der Zander, Quirine E.W.
AU - Roumans, Rachel
AU - Kusters, Carolus H.J.
AU - Dehghani, Nikoo
AU - Masclee, Ad A.M.
AU - de With, Peter H.N.
AU - van der Sommen, Fons
AU - Snijders, Chris C.P.
AU - Schoon, Erik J.
PY - 2024/12
Y1 - 2024/12
N2 - Background and Aims: Computer-aided diagnosis (CADx) for the optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human–artificial intelligence interaction are lacking. Our aim was to investigate endoscopists’ trust in CADx by evaluating whether communicating a calibrated algorithm confidence score improved trust. Methods: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterward, the same polyp was again shown with a CADx prediction: either only a prediction (benign or premalignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre)malignant prediction. In half of the polyps, CADx was mandatory, and for the other half, CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as the gold standard. Endoscopists’ trust in CADx was measured as CADx prediction utilization: the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction. Results: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists’ diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, P < .001). The CADx prediction was used in 36.5% (n = 183 of 501) disagreements. Adding a confidence score led to lower CADx prediction utilization, except when the confidence score surpassed 60. Mandatory CADx decreased CADx prediction utilization compared to optional CADx. Appropriate trust—using correct or disregarding incorrect CADx predictions—was 48.7% (n = 244 of 501). Conclusions: Appropriate trust was common, and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human–artificial intelligence interaction.
AB - Background and Aims: Computer-aided diagnosis (CADx) for the optical diagnosis of colorectal polyps is thoroughly investigated. However, studies on human–artificial intelligence interaction are lacking. Our aim was to investigate endoscopists’ trust in CADx by evaluating whether communicating a calibrated algorithm confidence score improved trust. Methods: Endoscopists optically diagnosed 60 colorectal polyps. Initially, endoscopists diagnosed the polyps without CADx assistance (initial diagnosis). Immediately afterward, the same polyp was again shown with a CADx prediction: either only a prediction (benign or premalignant) or a prediction accompanied by a calibrated confidence score (0-100). A confidence score of 0 indicated a benign prediction, 100 a (pre)malignant prediction. In half of the polyps, CADx was mandatory, and for the other half, CADx was optional. After reviewing the CADx prediction, endoscopists made a final diagnosis. Histopathology was used as the gold standard. Endoscopists’ trust in CADx was measured as CADx prediction utilization: the willingness to follow CADx predictions when the endoscopists initially disagreed with the CADx prediction. Results: Twenty-three endoscopists participated. Presenting CADx predictions increased the endoscopists’ diagnostic accuracy (69.3% initial vs 76.6% final diagnosis, P < .001). The CADx prediction was used in 36.5% (n = 183 of 501) disagreements. Adding a confidence score led to lower CADx prediction utilization, except when the confidence score surpassed 60. Mandatory CADx decreased CADx prediction utilization compared to optional CADx. Appropriate trust—using correct or disregarding incorrect CADx predictions—was 48.7% (n = 244 of 501). Conclusions: Appropriate trust was common, and CADx prediction utilization was highest for the optional CADx without confidence scores. These results express the importance of a better understanding of human–artificial intelligence interaction.
KW - Humans
KW - Artificial Intelligence
KW - Colonic Polyps/diagnosis
KW - Diagnosis, Computer-Assisted/methods
KW - Colonoscopy/methods
KW - Trust
KW - Male
KW - Female
KW - Algorithms
KW - Colorectal Neoplasms/diagnosis
KW - Precancerous Conditions/diagnosis
KW - Middle Aged
UR - http://www.scopus.com/inward/record.url?scp=85202025950&partnerID=8YFLogxK
U2 - 10.1016/j.gie.2024.06.029
DO - 10.1016/j.gie.2024.06.029
M3 - Article
C2 - 38942330
AN - SCOPUS:85202025950
SN - 0016-5107
VL - 100
SP - 1070-1078-e10
JO - Gastrointestinal Endoscopy
JF - Gastrointestinal Endoscopy
IS - 6
ER -