Leveraging Retrieval-Augmented Generation to Accelerate Discoveries on Mealworm Larvae and Plastic Degradation

Figshare2025-12-09 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/Leveraging_Retrieval-Augmented_Generation_to_Accelerate_Discoveries_on_Mealworm_Larvae_and_Plastic_Degradation/30834494

下载链接

链接失效反馈

官方服务：

资源简介：

Large language models (LLMs) are transforming broad research areas, yet concerns about their trustworthiness remain. This study explored the use of Retrieval-Augmented Generation (RAG) to improve LLMs’ knowledge extraction in the field of mealworm-mediated plastic degradation. We integrated publications up to June 2024 (75 papers) to evaluate the model performance using a curated data set of 100 queries. GraphRAG, LightRAG, and a traditional RAG were examined with five LLM models (GPT-4o, GPT-5, Deepseek-V3.1, Qwen-plus, and Llama-3.3). Our results reveal that LightRAG improved LLMs the most in information extraction. Specifically, for quantitative information extraction, the best performing RAG + LLM pipeline achieves over 92% accuracy. Meanwhile, for open-ended queries, LightRAG + Llama answers the questions with the best balance of precision and information coverage. Moreover, empirical results validated the answers about the mealworm gut microbiome composition and plastic deconstruction patterns through the LightRAG + Llama pipeline. In designing plastic biodegradation experiments, the original LLMs outperformed RAG-trained LLMs. The expandable nature of RAG enables timely updates to the knowledge base. This study demonstrates a reliable application of advanced LLMs in the emerging environmental science field. Our findings identify challenges, such as conflict handling, and guide future research in scientific artificial intelligence.

创建时间：

2025-12-09