G3/2017/044198
收藏Mendeley Data2024-06-25 更新2024-06-28 收录
下载链接:
https://figshare.com/articles/dataset/G3_2017_044198/5002562
下载链接
链接失效反馈官方服务:
资源简介:
Data for publication G3/2017/044198 on Red deer linkage mapping. The pipeline for analysis is archived at https://github.com/susjoh/DeerMapv4. The data provided can be input into the pipeline at script 2.0_Crimap_Run_Initial_a.R and run sequentially. All IDs have been recoded in order to be run in the Crimap software, and so should not be merged with other datasets. They comprise the individuals used for linkage mapping analysis only. Deer31_QC.RData is the genotype and sex data in R/GenABEL format for all deer after basic quality control. Pseudoautosomal_SNPs.txt is a list of all SNPs in the pseudoautosomal region on the X chromosome. Pedigree_16-05-02.recoded.txt is the raw pedigree data. 3_Deer_Sequences.fa is the flanking sequence information for each SNP locus. Table S1: Rum red deer (\textit{Cervus elaphus}) linkage map after Build 5. Column headers are as follows: SNP.Name = SNP Name, BTA.Chr = cattle chromosome, BTA.Position = cattle base pair position (BTA UMD v3.0), CEL.LG = Deer linkage group, CEL.Order = Marker order on deer linkage group, cMPosition.SexAveraged = sex-averaged linkage map position (cM), cMPosition.Female = female linkage map position (cM), cMPosition.Male = male linkage map position (cM), Skeleton.SNP = indicates if SNP is included in the skeleton map (see main text), PAR = indicates if SNP is in the pseudoautosomal region, Estimated.Mb.Position = the estimated genomic position on the deer genome (see methods), inf.mei = number of informative meioses, inf.mei.PK = number of informative meioses where grandparental phase was known, tot\_f = number of informative meioses in females, tot\_m = number of informative meioses in males, pk\_f = number of informative meioses in females with phase known, pk\_f = number of informative meioses in males with phase known, A1 = major reference allele, A2 = minor reference allele, CallRate = SNP call rate in original dataset (N$_{IDS}$ = 2361), Q.2 = minor allele frequency, PseudoAutosomalSNP = indicates if sex-linked SNPs (CEL34) are in the pseudoautosomal region. Table S2: Data for Figure S5, comparison of map positions between Cattle (bp, build vUMD 3.0), Deer (cM, Build 5) and Sheep (bp, build Oar\_v3.1) for the X chromosome. Window.Start = cM position of the start of the window of most likely position, Window.Stop = cM position of the end of the window, CEL.LG = Deer linkage group identifier, SNP.Start = First mapped SNP at the start cM position, SNP.Stop = Last mapped SNP at the end cM position, chunk = chunk identifier, SNP.Start.Of.Chromosome = Indicates if the most likely position is at the beginning of the chromosome, SNP.End.Of.Chromosome = Indicates if the most likely position is at the end of the chromosome, Unmap.SNP.vec = Vector of SNPs within the unmapped chunk. Table S3: Probabilities of crossing over within 1Mb windows in males and females.Column headers are as follows: CEL.LG = Deer linkage group identifier, Window = Window order, Start = Mb position of the start of the window, Stop = Mb position of the end of the window, Locus.Count & Number of loci within the window, Mean.Inf.Count = Mean number of informative loci, cM = Sex-averaged recombination rate, cM.Male = Male recombination rate, cM.Female = Female recombination rate, Window.To.End = Window order from the other end of the chromosomes, FM.Rate = Ratio of female to male recombination rate, adj.cM = Sex-averaged recombination rate adjusted for chromosome size, adj.cM.Male & Male recombination rate adjusted for chromosome size, adj.cM.Female = Female recombination rate adjusted for chromosome size, adj.FM.Rate = Ratio of female to male recombination rate adjusted for chromosome size. Table S4: Probabilities of crossing over within 1Mb windows in males and females. Column headers are as follows: CEL.LG & Deer linkage group identifier, Window = Window order, Start = Mb position of the start of the window, Stop = Mb position of the end of the window, Locus.Count = Number of loci within the window, Mean.Inf.Count = Mean number of informative loci, cM = Sex-averaged recombination rate, cM.Male = Male recombination rate, cM.Female = Female recombination rate, Window.To.End = Window order from the other end of the chromosomes, FM.Rate = Ratio of female to male recombination rate, adj.cM = Sex-averaged recombination rate adjusted for chromosome size, adj.cM.Male = Male recombination rate adjusted for chromosome size, adj.cM.Female = Female recombination rate adjusted for chromosome size, adj.FM.Rate = Ratio of female to male recombination rate adjusted for chromosome size. Table S5: BLAST results for SNP flanking sequences in the Deer against the Sheep (Oar\_v3.1) and Cattle (Btau\_4.6.1) genomes in order to determine lineage of origin. Column headers are as follows: Locus\_Name = SNP ID, Species = Reference sequence, Ovis or Bos for sheep and cattle, respectively., bit = Bit score of local alignment., Chr = Reference species chromosome., PCmatch = Percentage match between query and reference sequences., matches = Number of matching bases between query and reference sequences., mismatches = Number of mismatching bases between query and reference sequences., gaps = Number of gaps between query and reference sequences., SeqStart = Query sequence start position., SeqSto = Query sequence stop position., ChrStart = Reference sequence start position., ChrStop = Reference sequence stop position., eval = Expect value (E): the number of hits one can expect to see by chance when searching a database of a particular size., Ncount = Number of unknown bases in the deer query sequence., Informative.Length = Number of known bases in the deer query sequence., PCmatch.Full = Percentage of matches at known bases., Count = Number of independent hits for the query sequence., BTA3Chr = Chromosome on cattle genome version vUMD3.0., BTA3Position = Position on cattle genome version vUMD3.0., CEL.LG = Deer linkage group., cMPosition.run5 = Deer build 5 centimorgan position. Table S6: Raw data and results for binomial tests for the transmission distortion analysis. Alleles were assigned as A or B as the first and second reference allele in the GenABEL files. Column headers are as follows: A.Count & Number of A alleles transmitted from FID to offspring, SNP.Name = SNP ID, Parent = Indicates if FID was the father or mother, Geno.Count = Number of informative transmissions, P.val = P-value from the exact binomial test, CEL.order = The order of SNPs on the linkage group, CEL.LG = Linkage group, Fission = The fission/fusion history of the chromosomes, cMPosition.run5 = Deer build 5 centimorgan position., Dummy.Position = The estimated genomic position on the deer genome, Bin = 1Mb Window in which the SNP falls.
本数据集为编号G3/2017/044198的红鹿连锁图谱研究的发表配套数据。其分析流程已归档于https://github.com/susjoh/DeerMapv4。所提供的数据可通过脚本2.0_Crimap_Run_Initial_a.R输入至该分析流程并按顺序运行。所有个体ID已重新编码,以适配Crimap软件的运行要求,因此请勿与其他数据集合并,本数据集仅包含用于连锁图谱分析的个体。
Deer31_QC.RData为经过基础质量控制的所有红鹿的基因型与性别数据,格式为R/GenABEL格式。
Pseudoautosomal_SNPs.txt为X染色体拟常染色体区域内所有单核苷酸多态性(Single Nucleotide Polymorphism, SNP)的列表。
Pedigree_16-05-02.recoded.txt为重新编码后的原始系谱数据。
3_Deer_Sequences.fa为每个SNP位点的侧翼序列信息。
表S1:第5版构建后的朗姆岛红鹿(*Cervus elaphus*)连锁图谱。列名说明如下:
SNP.Name:SNP名称;
BTA.Chr:牛染色体编号;
BTA.Position:牛参考基因组UMD v3.0版本的碱基对位置;
CEL.LG:红鹿连锁群;
CEL.Order:SNP在红鹿连锁群上的排序;
cMPosition.SexAveraged:性别平均连锁图谱位置(单位:厘摩,cM);
cMPosition.Female:雌性连锁图谱位置(单位:cM);
cMPosition.Male:雄性连锁图谱位置(单位:cM);
Skeleton.SNP:标记该SNP是否纳入骨架图谱(详见正文);
PAR:标记该SNP是否位于拟常染色体区域;
Estimated.Mb.Position:红鹿参考基因组上的预估基因组位置(详见方法部分);
inf.mei:信息性减数分裂的数量;
inf.mei.PK:已知祖代相位的信息性减数分裂数量;
tot_f:雌性个体中的信息性减数分裂总数;
tot_m:雄性个体中的信息性减数分裂总数;
pk_f:已知相位的雌性信息性减数分裂数量;
pk_m:已知相位的雄性信息性减数分裂数量;
A1:主要参考等位基因;
A2:次要参考等位基因;
CallRate:原始数据集的SNP检出率(样本量N$_{IDS}$=2361);
Q.2:次要等位基因频率;
PseudoAutosomalSNP:标记性连锁SNP(CEL34)是否位于拟常染色体区域。
表S2:图S5的配套数据,用于比较X染色体上牛(碱基对,UMD v3.0版本)、红鹿(厘摩,第5版构建)和绵羊(碱基对,Oar_v3.1版本)的图谱位置。列名说明如下:
Window.Start:最可能位置窗口的起始厘摩位置;
Window.Stop:最可能位置窗口的终止厘摩位置;
CEL.LG:红鹿连锁群标识符;
SNP.Start:起始厘摩位置处的第一个已定位SNP;
SNP.Stop:终止厘摩位置处的最后一个已定位SNP;
chunk:片段标识符;
SNP.Start.Of.Chromosome:标记最可能位置是否位于染色体起始端;
SNP.End.Of.Chromosome:标记最可能位置是否位于染色体终止端;
Unmap.SNP.vec:未定位片段内的SNP向量。
表S3:雌雄个体1Mb窗口内的重组概率。列名说明如下:
CEL.LG:红鹿连锁群标识符;
Window:窗口排序;
Start:窗口起始的Mb位置;
Stop:窗口终止的Mb位置;
Locus.Count:窗口内的位点数量;
Mean.Inf.Count:信息性位点的平均数量;
cM:性别平均重组率;
cM.Male:雄性重组率;
cM.Female:雌性重组率;
Window.To.End:以染色体另一端为起点的窗口排序;
FM.Rate:雌雄重组率之比;
adj.cM:经染色体大小校正的性别平均重组率;
adj.cM.Male:经染色体大小校正的雄性重组率;
adj.cM.Female:经染色体大小校正的雌性重组率;
adj.FM.Rate:经染色体大小校正的雌雄重组率之比。
表S4:雌雄个体1Mb窗口内的重组概率。列名说明如下:
CEL.LG:红鹿连锁群标识符;
Window:窗口排序;
Start:窗口起始的Mb位置;
Stop:窗口终止的Mb位置;
Locus.Count:窗口内的位点数量;
Mean.Inf.Count:信息性位点的平均数量;
cM:性别平均重组率;
cM.Male:雄性重组率;
cM.Female:雌性重组率;
Window.To.End:以染色体另一端为起点的窗口排序;
FM.Rate:雌雄重组率之比;
adj.cM:经染色体大小校正的性别平均重组率;
adj.cM.Male:经染色体大小校正的雄性重组率;
adj.cM.Female:经染色体大小校正的雌性重组率;
adj.FM.Rate:经染色体大小校正的雌雄重组率之比。
表S5:红鹿SNP侧翼序列与绵羊(Oar_v3.1版本)、牛(Btau_4.6.1版本)参考基因组的BLAST比对结果,用于确定位点的起源谱系。列名说明如下:
Locus_Name:SNP编号;
Species:参考序列物种,分别为绵羊(Ovis)和牛(Bos);
bit:局部联配的比特得分;
Chr:参考物种的染色体编号;
PCmatch:查询序列与参考序列的匹配百分比;
matches:查询序列与参考序列的匹配碱基数;
mismatches:查询序列与参考序列的错配碱基数;
gaps:查询序列与参考序列的比对间隙数;
SeqStart:查询序列的起始位置;
SeqSto:查询序列的终止位置;
ChrStart:参考序列的起始位置;
ChrStop:参考序列的终止位置;
eval:期望得分(E值):在特定大小的数据库中随机搜索得到的命中次数期望;
Ncount:红鹿查询序列中未知碱基的数量;
Informative.Length:红鹿查询序列中已知碱基的数量;
PCmatch.Full:已知碱基区域的匹配百分比;
Count:查询序列的独立命中次数;
BTA3Chr:牛参考基因组UMD v3.0版本的染色体编号;
BTA3Position:牛参考基因组UMD v3.0版本的位置;
CEL.LG:红鹿连锁群;
cMPosition.run5:红鹿第5版构建的厘摩位置。
表S6:传递偏倚分析的二项式检验原始数据与结果。等位基因按GenABEL文件中的第一、第二参考等位基因分别记为A和B。列名说明如下:
A.Count:从亲本FID传递给后代的A等位基因数量;
SNP.Name:SNP编号;
Parent:标记亲本FID为父本还是母本;
Geno.Count:信息性传递的数量;
P.val:精确二项式检验得到的P值;
CEL.order:SNP在连锁群上的排序;
CEL.LG:连锁群;
Fission:染色体的分裂/融合历史;
cMPosition.run5:红鹿第5版构建的厘摩位置;
Dummy.Position:红鹿参考基因组上的预估基因组位置;
Bin:SNP所在的1Mb窗口。
创建时间:
2023-06-28



