TY - JOUR
T1 - Evaluating the AI Tool “Elicit” as a Semi-Automated Second Reviewer for Data Extraction in Systematic Reviews
T2 - A Proof-of-Concept
AU - Hilkenmeier, Frederic
AU - Pelzer, Marie
AU - Stierle, Christian Martin Gerhard
AU - Fink-Lamotte, Jakob
N1 - Publisher Copyright:
© The Author(s) 2025. This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
PY - 2025/12/3
Y1 - 2025/12/3
N2 - Systematic reviews are essential for evidence synthesis but often require extensive time and resources, especially during data extraction. This proof-of-concept study evaluates the performance of Elicit, an AI tool specifically developed to support systematic reviews, in the context of a systematic review on psychological factors in dermatological conditions. We compared Elicit’s automated data extraction with manually extracted data across 43 studies and 602 data points. Both were assessed against a consensus-based ground truth. Elicit achieved an overall accuracy of 81.4%, compared to 86.7% for human reviewers—a difference that was not statistically significant. In cases where Elicit and the human reviewer extracted the same information, this information was correct in 100% of instances, suggesting that agreement between human and machine may serve as a reliable proxy for validity. Based on these results, we propose a semi-automated workflow in which Elicit functions as a second reviewer, reducing workload while maintaining high data quality. Our results demonstrate that domain-specific AI tools can effectively augment data extraction in systematic reviews, especially in settings with limited time or personnel.
AB - Systematic reviews are essential for evidence synthesis but often require extensive time and resources, especially during data extraction. This proof-of-concept study evaluates the performance of Elicit, an AI tool specifically developed to support systematic reviews, in the context of a systematic review on psychological factors in dermatological conditions. We compared Elicit’s automated data extraction with manually extracted data across 43 studies and 602 data points. Both were assessed against a consensus-based ground truth. Elicit achieved an overall accuracy of 81.4%, compared to 86.7% for human reviewers—a difference that was not statistically significant. In cases where Elicit and the human reviewer extracted the same information, this information was correct in 100% of instances, suggesting that agreement between human and machine may serve as a reliable proxy for validity. Based on these results, we propose a semi-automated workflow in which Elicit functions as a second reviewer, reducing workload while maintaining high data quality. Our results demonstrate that domain-specific AI tools can effectively augment data extraction in systematic reviews, especially in settings with limited time or personnel.
KW - systematic reviews
KW - large language models
KW - machine-assisted review
KW - Elicit
KW - data extraction
KW - data collection
KW - machine learning
KW - evidence synthesis
KW - semi-automated workflows
UR - https://www.scopus.com/pages/publications/105023860472
U2 - 10.1177/08944393251404052
DO - 10.1177/08944393251404052
M3 - Article
SN - 0894-4393
JO - Social Science Computer Review
JF - Social Science Computer Review
ER -