five

Gene_Network_Sequence_Variant_Datasets_mTOR_TGF_Beta

收藏
Figshare2022-03-05 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Gene_Network_Variant_Dataset_mTOR_TGF_Beta_zip/19312214
下载链接
链接失效反馈
官方服务:
资源简介:
The mTOR and TGF-Beta pathway genes were selected based on KEGG database (https://www.genome.jp/kegg/). Genomic sequences of the pathway genes were fetched from GRCh37 human genome database based on their genomic coordinates recorded in NCBI database (https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml).Reference sequences composed of each gene sequence were used as a template to generate 400 control and 400 patient sequences for each pathway. At the first step, we created two lists of integers for both groups, that represent the positions of polymorphic and pathogenic variants (‘polymorphic positions list’and ‘pathogenic positions list’). Each integer in these lists has been randomly chosen to be within certain consecutive intervals and exclusive to the other list. This interval has been set to 100 and 200 for polymorphic and pathogenic variants, respectively (Any integer within the range 1-100, 100-200, 200-300, and so on, for ‘polymorphic positions list’, and any integer within the range 1-200, 200-400, 400-600, and so on, for ‘pathogenic positions list’). In the second step, the reference base at each position represented in ‘polymorphic positions list’ was replaced by the variant base in 40% of both control and patient sequences. The alterations in these positions were accepted as non-pathogenic and/or common variants with 0.40 minor allele frequency in both groups. In the next step, reference base at each position represented in ‘pathogenic positions list’was replaced by the variant base in 25% of control sequences and 30% of patient sequences. The alterations in these positions were accepted as disease-associated/pathogenic variants with 0.25 allele frequency in the control group and 0.30 allele frequency in the patient group. In all these steps, we set minor allele frequency (MAF) higher, because, in contrary to single-gene disorders where rare variants (with MAF 0.01).All variant sequences are in haploid state.

本数据集的mTOR与转化生长因子-β(TGF-β)通路基因基于KEGG数据库(KEGG)筛选得到,数据库链接为https://www.genome.jp/kegg/。基于NCBI数据库(NCBI)记录的基因组坐标,从GRCh37人类基因组数据库中调取该通路基因的基因组序列。以每条基因的参考序列作为模板,为每条通路生成400条对照序列与400条患者序列。 第一步,我们为对照组与患者组分别构建两组整数列表,分别对应多态性变异位点与致病性变异位点(后文简称"多态性位点列表(polymorphic positions list)"与"致病性位点列表(pathogenic positions list)")。两组列表中的整数均随机选取自特定连续区间,且互不重叠。其中多态性位点列表的区间步长设为100,即整数可选取自1-100、100-200、200-300等区间;致病性位点列表的区间步长设为200,即整数可选取自1-200、200-400、400-600等区间。 第二步,在对照组与患者组各40%的序列中,将"多态性位点列表"中各位置的参考碱基替换为变异碱基。此类位点的变异被认定为非致病性/常见变异,两组的次要等位基因频率(minor allele frequency, MAF)均为0.40。 随后,在对照组25%的序列与患者组30%的序列中,将"致病性位点列表"中各位置的参考碱基替换为变异碱基。此类位点的变异被认定为疾病相关/致病性变异,对照组的次要等位基因频率为0.25,患者组为0.30。 上述所有步骤中,我们调高了次要等位基因频率的设置阈值——这与单基因遗传病中以罕见变异(次要等位基因频率为0.01)为主的情况截然相反。所有变异序列均处于单倍体状态。
创建时间:
2022-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作