Output datasets from ML–assisted bibliometric workflow in African phytochemical metabolomics research

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/Output_datasets_from_ML_assisted_bibliometric_workflow_in_African_phytochemical_metabolomics_research/30396481

下载链接

链接失效反馈

官方服务：

资源简介：

This collection contains supplementary datasets generated during the machine learning–assisted bibliometric workflow for metabolomics and phytochemical research. The datasets represent sequential outputs derived from the integration and harmonisation of bibliographic metadata from Scopus, Web of Science (WoS), and Dimensions, processed via R and Python environments. The datasets were produced through distinct workflow stages: Dataset 1A (merged_dataset2.xlsx): Consolidated metadata produced in R from the merged raw bibliographic exports of Scopus, WoS, and Dimensions.Dataset 1B (sampled_data.xlsx): A stratified random sample generated in Python for pretraining and manual annotation.Dataset 1C (sample_data_pretrained.xlsx): Annotated sample dataset manually screened according to inclusion and exclusion criteria.Dataset 1D (highlighted_full_data_with_predictions.xlsx): The complete harmonised dataset automatically classified using the trained XGBoost model.Dataset 1E (absolute_metabolomics_data.xlsx): Final curated dataset of relevant records extracted from the ML-filtered corpus.Importantly, the file names of each dataset presented here were renamed from their original Google Drive file paths (referenced in the Python Google Colab scripts) to ensure sequential, descriptive, and logically ordered naming. This adjustment enhances clarity, reproducibility, and cross-reference consistency across all linked repositories.

创建时间：

2025-10-19