TY - JOUR
T1 - Man against machine reloaded
T2 - performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions
AU - Haenssle, H. A.
AU - Fink, C.
AU - Toberer, F.
AU - Winkler, J.
AU - Stolz, W.
AU - Deinlein, T.
AU - Hofmann-Wellenhof, R.
AU - Lallas, A.
AU - Emmert, S.
AU - Buhl, T.
AU - Zutt, M.
AU - Blum, A.
AU - Abassi, M. S.
AU - Thomas, L.
AU - Tromme, I.
AU - Tschandl, P.
AU - Enk, A.
AU - Rosenberger, A.
AU - Reader Study Level I and Level II Groups
A2 - Alt, Christina
A2 - Bachelerie, Marie
A2 - Bajaj, Sonali
A2 - Balcere, Alise
A2 - Baricault, Sophie
A2 - Barthaux, Clément
A2 - Beckenbauer, Yvonne
A2 - Bertlich, Ines
A2 - Blum, Andreas
A2 - Bouthenet, Marie France
A2 - Brassat, Sophie
A2 - Marcel Buck, Philipp
A2 - Buder-Bakhaya, Kristina
A2 - Cappelletti, Maria Letizia
A2 - Chabbert, Cécile
A2 - De Labarthe, Julie
A2 - DeCoster, Eveline
A2 - Deinlein, Teresa
A2 - Dobler, Michèle
A2 - Dumon, Daphnée
A2 - Emmert, Steffen
A2 - Gachon-Buffet, Julie
A2 - Gusarov, Mikhail
A2 - Hartmann, Franziska
A2 - Hartmann, Julia
A2 - Herrmann, Anke
A2 - Hoorens, Isabelle
A2 - Hulstaert, Eva
A2 - Karls, Raimonds
A2 - Kolonte, Andreea
A2 - Kromer, Christian
A2 - Lallas, Aimilios
N1 - Copyright © 2019 European Society for Medical Oncology. Published by Elsevier Ltd. All rights reserved.
PY - 2020/1
Y1 - 2020/1
N2 - Background: Convolutional neural networks (CNNs) efficiently differentiate skin lesions by image analysis. Studies comparing a market-approved CNN in a broad range of diagnoses to dermatologists working under less artificial conditions are lacking. Materials and methods: One hundred cases of pigmented/non-pigmented skin cancers and benign lesions were used for a two-level reader study in 96 dermatologists (level I: dermoscopy only; level II: clinical close-up images, dermoscopy, and textual information). Additionally, dermoscopic images were classified by a CNN approved for the European market as a medical device (Moleanalyzer Pro, FotoFinder Systems, Bad Birnbach, Germany). Primary endpoints were the sensitivity and specificity of the CNN's dichotomous classification in comparison with the dermatologists’ management decisions. Secondary endpoints included the dermatologists’ diagnostic decisions, their performance according to their level of experience, and the CNN's area under the curve (AUC) of receiver operating characteristics (ROC). Results: The CNN revealed a sensitivity, specificity, and ROC AUC with corresponding 95% confidence intervals (CI) of 95.0% (95% CI 83.5% to 98.6%), 76.7% (95% CI 64.6% to 85.6%), and 0.918 (95% CI 0.866–0.970), respectively. In level I, the dermatologists’ management decisions showed a mean sensitivity and specificity of 89.0% (95% CI 87.4% to 90.6%) and 80.7% (95% CI 78.8% to 82.6%). With level II information, the sensitivity significantly improved to 94.1% (95% CI 93.1% to 95.1%; P < 0.001), while the specificity remained unchanged at 80.4% (95% CI 78.4% to 82.4%; P = 0.97). When fixing the CNN's specificity at the mean specificity of the dermatologists’ management decision in level II (80.4%), the CNN's sensitivity was almost equal to that of human raters, at 95% (95% CI 83.5% to 98.6%) versus 94.1% (95% CI 93.1% to 95.1%); P = 0.1. In contrast, dermatologists were outperformed by the CNN in their level I management decisions and level I and II diagnostic decisions. More experienced dermatologists frequently surpassed the CNN's performance. Conclusions: Under less artificial conditions and in a broader spectrum of diagnoses, the CNN and most dermatologists performed on the same level. Dermatologists are trained to integrate information from a range of sources rendering comparative studies that are solely based on one single case image inadequate.
AB - Background: Convolutional neural networks (CNNs) efficiently differentiate skin lesions by image analysis. Studies comparing a market-approved CNN in a broad range of diagnoses to dermatologists working under less artificial conditions are lacking. Materials and methods: One hundred cases of pigmented/non-pigmented skin cancers and benign lesions were used for a two-level reader study in 96 dermatologists (level I: dermoscopy only; level II: clinical close-up images, dermoscopy, and textual information). Additionally, dermoscopic images were classified by a CNN approved for the European market as a medical device (Moleanalyzer Pro, FotoFinder Systems, Bad Birnbach, Germany). Primary endpoints were the sensitivity and specificity of the CNN's dichotomous classification in comparison with the dermatologists’ management decisions. Secondary endpoints included the dermatologists’ diagnostic decisions, their performance according to their level of experience, and the CNN's area under the curve (AUC) of receiver operating characteristics (ROC). Results: The CNN revealed a sensitivity, specificity, and ROC AUC with corresponding 95% confidence intervals (CI) of 95.0% (95% CI 83.5% to 98.6%), 76.7% (95% CI 64.6% to 85.6%), and 0.918 (95% CI 0.866–0.970), respectively. In level I, the dermatologists’ management decisions showed a mean sensitivity and specificity of 89.0% (95% CI 87.4% to 90.6%) and 80.7% (95% CI 78.8% to 82.6%). With level II information, the sensitivity significantly improved to 94.1% (95% CI 93.1% to 95.1%; P < 0.001), while the specificity remained unchanged at 80.4% (95% CI 78.4% to 82.4%; P = 0.97). When fixing the CNN's specificity at the mean specificity of the dermatologists’ management decision in level II (80.4%), the CNN's sensitivity was almost equal to that of human raters, at 95% (95% CI 83.5% to 98.6%) versus 94.1% (95% CI 93.1% to 95.1%); P = 0.1. In contrast, dermatologists were outperformed by the CNN in their level I management decisions and level I and II diagnostic decisions. More experienced dermatologists frequently surpassed the CNN's performance. Conclusions: Under less artificial conditions and in a broader spectrum of diagnoses, the CNN and most dermatologists performed on the same level. Dermatologists are trained to integrate information from a range of sources rendering comparative studies that are solely based on one single case image inadequate.
KW - deep learning
KW - dermoscopy
KW - melanoma
KW - Moleanalyzer Pro
KW - neural network
KW - skin cancer
UR - http://www.scopus.com/inward/record.url?scp=85077722960&partnerID=8YFLogxK
U2 - 10.1016/j.annonc.2019.10.013
DO - 10.1016/j.annonc.2019.10.013
M3 - Article
C2 - 31912788
AN - SCOPUS:85077722960
SN - 0923-7534
VL - 31
SP - 137
EP - 143
JO - Annals of Oncology
JF - Annals of Oncology
IS - 1
ER -