five

Data Sheet 1_TRACE: applying AI language models to extract ancestry information from curated biomedical literature.pdf

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_TRACE_applying_AI_language_models_to_extract_ancestry_information_from_curated_biomedical_literature_pdf/30163501
下载链接
链接失效反馈
官方服务:
资源简介:
IntroductionAncestry reporting is essential to ensure transparency and proper representation in biomedical studies. However, manually extracting this information from study texts is time-consuming and inefficient. In this paper, we present TRACE (Tool for Researching Ancestry and Cell Extraction), powered by GPT-4 and web-crawling, to automate ancestry identification by detecting cell lines or cultures in texts and tracing their ancestry. MethodsTRACE extracts cell lines and primary cultures from research articles and follows web sources to determine their ancestry. We compared TRACE's outputs to a manually generated database to confirm its performance in identifying and verifying ancestry information. ResultsThe results reveal an overrepresentation of European/White samples and significant underreporting. TRACE enables large-scale, systematic ancestry analysis—a valuable resource for researchers and agencies assessing biases in sample selection. ConclusionsAs an open-source tool, TRACE it facilitates broader use to evaluate and improve ancestry representation in biomedical research.
创建时间:
2025-09-19
二维码
社区交流群
二维码
科研交流群
商业服务