five

Invasion Biology WikiProject Scientific Papers: Text Data Mining and LLM-based Information Extraction of Species, Locations, Habitats, and Ecosystems

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13956882
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the abstract and full-text for publication DOIs from the Invasion Biology WikiProject (DOI: 10.5281/zenodo.12518036). The data was retrieved using the ask.orkg.org API.  The resulting CSV file includes the following fields: "ASK ID", "DOI", "Title", "Abstract", and "Full-text". Of the 49,438 queried DOIs, the ASK database provided: Total DOIs processed: 12,636 DOIs with neither abstract nor full-text: 36 (abstract token count was less than 10) DOIs with abstracts but no full-text: 9,766 DOIs with both abstract and full-text: 2,816 The second part of the dataset contains structured information extracted from the publications using the GPT-4o Large Language Model. This structured data is included in the zipped folder structured-publications.zip. The accompanying GitHub repository provides access to the code and scripts used at various stages of the information extraction (IE) process. Theme of the Study:"Mining for Species, Locations, Habitats, and Ecosystems from Scientific Papers in Invasion Biology: A Large-Scale Exploratory Study with Large Language Models."
创建时间:
2024-12-19
二维码
社区交流群
二维码
科研交流群
商业服务