SunnyLin/clinvar-summary
收藏Hugging Face2025-10-23 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/SunnyLin/clinvar-summary
下载链接
链接失效反馈官方服务:
资源简介:
ClinVar-Summary数据集是ClinVar数据库的variant_summary.txt文件的压缩版,包含了人类基因变异的关键信息,如变异ID、基因符号和临床意义等。该数据集适用于生物信息学研究、数据分析和机器学习,但不应用于直接诊断或医疗决策。数据集的结构与原始ClinVar文件的字段相对应。数据集的创建过程包括动机、来源和处理步骤,同时数据集的偏差、风险和局限性部分提醒用户注意潜在的问题,如地理偏差和类别不平衡。数据集以单个名为train的拆分形式提供。
The ClinVar-Summary dataset is a compressed version of the ClinVar databases variant_summary.txt file, containing key information about human genetic variations such as Variant ID, Gene Symbol, and Clinical Significance. This dataset is suitable for bioinformatics research, data analysis, and machine learning, but not for direct diagnostic use or medical decision-making. The dataset structure corresponds to the fields in the original ClinVar file. The dataset creation involves motivation, source, and processing steps, while the section on bias, risks, and limitations cautions about potential issues like geographic bias and class imbalance. The dataset is provided as a single split named train.
提供机构:
SunnyLin



