Microchromosomes and their association with human diseases
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5880553
下载链接
链接失效反馈官方服务:
资源简介:
Supply table-S1: Commonly used patient derived cell lines, number of average microchromosomes, citation to original article.
Supply table-S2: List of PubMed abstracts and annotation of disease identified by artificial intelligence-based technique related to incidence of microchromosomes in human
Machine Learning_NER_code_output_20220120.zip: File containing the PubMed abstracts, machine learning analysis output and disease interpretation output.
Cell lines karyotype.zip: File containing the raw karyotype data of all head & neck cancer cell lines.
Brief description of methodology:
To investigate the incidence of microchromosomes in human genome, we mine the PubMed literature for studies related to keywords “((microchromosome) OR ("marker chromosome") OR ("small chromosome"))” and applying the filter “human”. A total of 1,365 abstracts are obtained from PubMed as per date 08-Jan-2022. We analyze the PubMed abstracts using the Named Entity Recognition (NER) technique of Machine Learning (ML) implemented in Spacy (3.0) – scispaCy (0.4.0) – Python (3.7) running on Windows 11 system. The scispaCy package NER model “en_ner_bc5cdr_md” which is pretrained on BC5CDR corpus was used for disease entity recognition (https://allenai.github.io/scispacy/). Approximately 2000 disease entities are recognized by the model from the abstract text of the 1365 articles. The disease entities present in the abstract texts are extracted and then grouped together for most common broad disease classes as shown in excel file Supply_Table-S1.xlsx. The Python code, PubMed input and output files are available in "Machine Learning_NER_code_output_20220120.zip". Overall, inherited or somatically acquired microchromosomes in human individuals are frequently reported with diseases and disorders like Cancer, Trisomy, Turner’s syndrome, Epilepsy, Infertility, and Autism.
创建时间:
2022-02-24



