Abstract Background Large language models (LLMs) like ChatGPT showed great potential in aiding medical research.A heavy workload in filtering records is needed during Mains Block Insulator the research process of evidence-based medicine, especially meta-analysis.However, few studies tried to use LLMs to help screen records in meta-analysis.
Objective In this research, we aimed to explore the possibility of incorporating multiple LLMs to facilitate the screening step based on the title and abstract of records during meta-analysis.Methods Various LLMs were evaluated, which includes GPT-3.5, GPT-4, Deepseek-R1-Distill, Qwen-2.
5, Phi-4, Llama-3.1, Gemma-2 and Claude-2.To assess our strategy, we selected three meta-analyses from the literature, together with a glioma meta-analysis embedded in the study, as additional validation.
For the automatic selection of records from curated meta-analyses, a four-step strategy called LARS-GPT was developed, consisting of (1) criteria selection and single-prompt (prompt with one criterion) creation, (2) best combination identification, (3) combined-prompt (prompt with one or more criteria) creation, and (4) request sending and answer summary.Recall, workload reduction, precision, and F1 score were calculated to assess the performance of LARS-GPT.Results A variable performance was found between different single-prompts, with a mean recall of 0.
800.Based on these single-prompts, we were able to find combinations with better performance than the Snack Bowl pre-set threshold.Finally, with a best combination of criteria identified, LARS-GPT showed a 40.
1% workload reduction on average with a recall greater than 0.9.Conclusions We show here the groundbreaking finding that automatic selection of literature for meta-analysis is possible with LLMs.
We provide it here as a pipeline, LARS-GPT, which showed a great workload reduction while maintaining a pre-set recall.