AI-Driven Knowledge-Data Graphs for Tracking Global Arbovirus Research
收藏DataCite Commons2025-06-01 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/AI-Driven_Knowledge-Data_Graphs_for_Tracking_Global_Arbovirus_Research/28088573/3
下载链接
链接失效反馈官方服务:
资源简介:
AbstarctArboviruses represent a critical area of research in the field of infectious diseases, necessitating a systematic and comprehensive analysis to extract actionable insights. This study introduces AI-driven knowledge-datagraph(KDG)that provide novel perspectiveson the global arbovirus research landscape.We have finetuned a generic large language model for extracting the hierarchical topic structure of the overall arbovirus research, forming the basis of KDGfor AI-driven meta-analysis of the research area. We then pre-trained a large language model AboBERT for establishing the knowlege-data association, in a low resource, high efficiency, and high accuracy fashion, to enable the integration of the structural and statistical analysis. We further extract the key entities such as researchers, institutes, journals, geographical entities, time, virus species, diseases, vector species, symptoms, etc, for forming KDGat deeper level of the topic hierarchy. By constructing a KDG, we facilitate dynamic, comprehensive, and indepth analysis and insights into global arbovirus research. This method transcends traditional, labor-intensive manual searches of scientific literature databases and the use of pre-defined taxonomies like MeSH, ensuring that it reflects the latest research trends. The integration of AI-driven KDGin arbovirus research demonstrates the potential for AI to revolutionize traditional research methodologies, offering a more nuanced and up-to-date understanding of this critical field.Files Descriptions<b>arbo_pubs.medline</b><b>.xlsx</b>: 72,887 records of arbovirus-related literature downloaded from PubMed. Literature records are saved in MEDLINE format, and the following are specific column definitions for MEDLINE:<b>PMID</b>: PubMed Unique Identifier. A unique number assigned to each PubMed record.<b>OWN</b>: Owner. The organization or database that contributed the record.<b>STAT</b>: Status. The current status of the record (e.g., MEDLINE, PubMed-not-MEDLINE).<b>DCOM</b>: Date Completed. The date the record was completed or finalized.<b>LR</b>: Last Revision Date. The date the record was last revised or updated.<b>IS</b>: ISSN. The International Standard Serial Number of the journal.<b>VI</b>: Volume. The volume number of the journal issue.<b>DP</b>: Publication Date. The date the article was published.<b>TI</b>: Title. The title of the article.<b>PG</b>: Pagination. The page numbers of the article in the journal.<b>FAU</b>: Full Author Name. The full names of the authors.<b>AU</b>: Author. The abbreviated names of the authors.<b>LA</b>: Language. The language in which the article is written.<b>PT</b>: Publication Type. The type of publication (e.g., Journal Article, Review).<b>PL</b>: Place of Publication. The country or location where the journal is published.<b>TA</b>: Journal Title Abbreviation. The abbreviated title of the journal.<b>JT</b>: Journal Title. The full title of the journal.<b>JID</b>: NLM Unique ID. A unique identifier assigned to the journal by the National Library of Medicine (NLM).<b>SB</b>: Subset. The subset to which the record belongs (e.g., AIM, IM).<b>MH</b>: MeSH Terms. Medical Subject Headings (MeSH) terms associated with the article.<b>RF</b>: Number of References. The number of references cited in the article.<b>EDAT</b>: Entrez Date. The date the record was added to PubMed.<b>MHDA</b>: MeSH Date. The date MeSH terms were added to the record.<b>CRDT</b>: Create Date. The date the record was created.<b>PHST</b>: Publication History Status Date. Dates related to the publication history (e.g., received, accepted).<b>AID</b>: Article Identifier. A unique identifier for the article (e.g., DOI, PII).<b>PST</b>: Publication Status. The status of the publication (e.g., ppublish, epublish).<b>SO</b>: Source. The full citation of the article.<b>IP</b>: Issue. The issue number of the journal.<b>TT</b>: Transliterated Title. The title transliterated into another script (e.g., Cyrillic to Latin).<b>RN</b>: EC/RN Number. Enzyme Commission (EC) numbers or CAS registry numbers for chemicals.<b>AB</b>: Abstract. The abstract of the article.<b>PMC</b>: PubMed Central Identifier. The identifier for the article in PubMed Central.<b>AD</b>: Affiliation. The institutional affiliation of the authors.<b>GR</b>: Grant Number. Grant numbers associated with the research.<b>OID</b>: Other ID. Other identifiers associated with the article.<b>OAB</b>: Other Abstract. Additional abstracts in other languages.<b>OABL</b>: Other Abstract Label. Labels for other abstracts.<b>OTO</b>: Other Term. Other terms or keywords associated with the article.<b>OT</b>: Other Term Owner. The source of other terms.<b>GN</b>: Gene Symbol. Gene symbols mentioned in the article.<b>PS</b>: Personal Name as Subject. Names of individuals who are the subject of the article.<b>FPS</b>: First Page. The first page number of the article.<b>LID</b>: Last Page. The last page number of the article.<b>CN</b>: Corporate Author. The name of a corporate or collective author.<b>CI</b>: Copyright Information. Copyright details for the article.<b>DEP</b>: Date of Electronic Publication. The date the article was published electronically.<b>EIN</b>: ELocation ID. The electronic location identifier (e.g., DOI).<b>CIN</b>: Comment In. Reference to a comment or commentary on the article.<b>CON</b>: Correction In. Reference to a correction or erratum for the article.<b>train_data.human_labeled</b><b>.xlsx</b>: Manually labelled training dataset. the following are specific column definitions:<b>PMID</b>: PubMed Unique Identifier. A unique number assigned to each PubMed record.<b>TI</b>: Title. The title of the article.<b>AB</b>: Abstract. The abstract of the article.<b>label</b>: Manually labelled tags. <b>__</b><b>label__</b><b>1</b>: Discovery and Classification, <b>__</b><b>label__</b><b>2</b>: Surveillance/Epidemology, <b>__</b><b>label__</b><b>3</b>: Detection/Diagnose Tool, <b>__</b><b>label__</b><b>4</b>: Mechanisms of Transmission and Pathogenesis, <b>__</b><b>label__</b><b>5</b>: Biosafety and Public Health Strategies, <b>__</b><b>label__</b><b>6</b>: Vector Competence and Vector Control, <b>__</b><b>label__</b><b>7</b>: Prevention and Treatment, <b>__</b><b>label__</b><b>8</b>: Genetic and Evolution.<b>all_prompts.md</b>: All prompts included in this article.<b>all_subtopics.csv</b>: All subtopics further divided by eight topics using DeepSeek.<b>subtopic.deepseek.res.jsonl</b>: The original results of using DeepSeek to segment subtopics.<b>virus_info_ext.deepseek.res.jsonl</b>: The original results of using DeepSeek to extract virus information.<b>vector_info_ext.deepseek.res.jsonl</b>: The original results of using DeepSeek to extract vector information.<b>symp_info_ext.deepseek.res.jsonl</b>: The original results of using DeepSeek to extract symptom information.
提供机构:
figshare
创建时间:
2025-05-23



