five

Microchromosomes and their association with human diseases

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5880553
下载链接
链接失效反馈
官方服务:
资源简介:
Supply table-S1: Commonly used patient derived cell lines, number of average microchromosomes, citation to original article. Supply table-S2: List of PubMed abstracts and annotation of disease identified by artificial intelligence-based technique related to incidence of microchromosomes in human Machine Learning_NER_code_output_20220120.zip: File containing the PubMed abstracts, machine learning analysis output and disease interpretation output. Cell lines karyotype.zip: File containing the raw karyotype data of all head & neck cancer cell lines.   Brief description of methodology: To investigate the incidence of microchromosomes in human genome, we mine the PubMed literature for studies related to keywords “((microchromosome) OR ("marker chromosome") OR ("small chromosome"))” and applying the filter “human”. A total of 1,365 abstracts are obtained from PubMed as per date 08-Jan-2022. We analyze the PubMed abstracts using the Named Entity Recognition (NER) technique of Machine Learning (ML) implemented in Spacy (3.0) – scispaCy (0.4.0) – Python (3.7) running on Windows 11 system. The scispaCy package NER model “en_ner_bc5cdr_md” which is pretrained on BC5CDR corpus was used for disease entity recognition (https://allenai.github.io/scispacy/). Approximately 2000 disease entities are recognized by the model from the abstract text of the 1365 articles. The disease entities present in the abstract texts are extracted and then grouped together for most common broad disease classes as shown in excel file Supply_Table-S1.xlsx. The Python code, PubMed input and output files are available in "Machine Learning_NER_code_output_20220120.zip". Overall, inherited or somatically acquired microchromosomes in human individuals are frequently reported with diseases and disorders like Cancer, Trisomy, Turner’s syndrome, Epilepsy, Infertility, and Autism.
创建时间:
2022-02-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作