Invasion Biology WikiProject Scientific Papers: Text Data Mining and LLM-based Information Extraction of Species, Locations, Habitats, and Ecosystems
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13956882
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the abstract and full-text for publication DOIs from the Invasion Biology WikiProject (DOI: 10.5281/zenodo.12518036). The data was retrieved using the ask.orkg.org API.
The resulting CSV file includes the following fields: "ASK ID", "DOI", "Title", "Abstract", and "Full-text".
Of the 49,438 queried DOIs, the ASK database provided:
Total DOIs processed: 12,636
DOIs with neither abstract nor full-text: 36 (abstract token count was less than 10)
DOIs with abstracts but no full-text: 9,766
DOIs with both abstract and full-text: 2,816
The second part of the dataset contains structured information extracted from the publications using the GPT-4o Large Language Model. This structured data is included in the zipped folder structured-publications.zip.
The accompanying GitHub repository provides access to the code and scripts used at various stages of the information extraction (IE) process.
Theme of the Study:"Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models."
创建时间:
2024-12-19



