five

Dataset and R code for: High functional allelic diversity and copy number in both MHC classes in the common buzzard

收藏
Figshare2023-04-27 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_and_R_code_for_High_functional_allelic_diversity_and_copy_number_in_both_MHC_classes_in_the_common_buzzard/16885255
下载链接
链接失效反馈
官方服务:
资源简介:
This data includes the genomic PacBio HiFi raw reads (fastq), Flye contigs (fasta) and phasebook generated haplotype-aware contigs (fasta), and Hi-C chromosome 29 sequence of the MHC class I and class II region (96,746 bp; fasta) in a zip folder. Sample information including amplicon ID, amplicon and allele depth, and MHC genotype, is included as well as data for 81 samples used for RNA sequencing, including ID, date sampled, MHC RNA transcipts, MHC genotypes (when available) and a key to identify MHC allele names with preliminary IDs. The data also includes R code and data necessary to reproduce Figure 1, 2, 3, 4, S2, S4, and S6-11. The code has been written by Jamie Winternitz, who can be contacted at jcwinternitz-at-gmail-dot-com. Genomic data in zipped folder "Genomic Buteo buteo MHC data.zip": Pacbio HiFi raw reads: "raw_HiFi.zip": "hi48toc29b.fq" "hi64toc29b.fq" "hi263toc29b.fq" "hi326toc29b.fq" Flye contigs: "7 Flye contigs from Buteo buteo genome MHC region.fasta" Phasebook haplotype-aware contigs: "haplotype-aware_contigs_phasebook.zip": "hi48toc29b_contigs_phasebook.fa" "hi64toc29b_contigs_phasebook.fa" "hi263toc29b_contigs_phasebook.fa" "hi326toc29b_contigs_phasebook.fa" Hi-C consensus from 3 individuals of MHC region of chromosome 29: "HiC_scaffold_33-Ch29 MHC region (96746 bp).fasta" Sample information for MHC-I and II genotypes as well as RNA transcript sampling data: "individual_allele_report_MHCI_genbank_names.csv" "individual_allele_report_MHCII_genbank_names.csv" "Samples with MHC-Iex3 and IIBex2 RNA transcripts and amplicon genotypes.xlsx" "MHC_1ex2&2Aexon2_transcripts_exact_match_transcriptomes_2018_9.xlsx" The R code required to reproduce Figure 1, S2 and S4 is "MHC allele freq and expression.R". The data required to reproduce Figure 1 is the following: "Pop_allele_freq_RNAseq_data.csv" "CRC_MHC_allele_per_indiv.csv" The R code required to reproduce Figure 2 and S6 is "multiple sequence alignment figures.R". The data required to reproduce Figure 2 and S6 is the following: "Buteo buteo MHC1 and HLA-A and Haal-UA exon 3 translation alignment character vectors.csv" "PBS_MHC-I_exon3.csv" "Buteo buteo MHC2 and HLA-DRB and Buga-DRB translation alignment character vectors.csv" "PBS_DRB1_exon2.csv" "Buteo buteo MHC2 recomb segments removed and HLA-DRB and Buga-DRB translation alignment character vectors.csv" "PBS_DRB1_exon2_no_recomb_segments.csv" "Buteo buteo MHC-I exon 2 and human and chicken translation alignment character vectors.csv" "PBS_MHC-I_exon2.csv" "Buteo buteo MHCIIA exon 2 with chicken alignment character vectors.csv" "PBS_MHC-IIA_exon2.csv" The R code required to reproduce Figure 3, S7-S10 is "Phylogenetic figures.R". The data required to reproduce Figure 3, S7-S10 is the following: "MHC1 FastTree.newick" "MHC1 IQ-TREE.newick" "MHC1 FastTree 110 sequences.csv" "Common species dataset MHC1 and MHC2.csv" "MHC2 FastTree.newick" "MHC2 IQ-TREE.newick" "MHC2 FastTree 65 sequences.csv" "10speciesBirdTree.nex" "hawk.png" "owl.png" "grouse.png" "vulture.png" The R code required to reproduce Figure 4 and S11 is "phylogeny for MHC-Iex2 and MHC-IIAex2.R" The data required to reproduce Figure 4 and S11 is the following: "MHC-I exon 2 FastTree.newick" "MHC1ex2 IQ-TREE.newick" "Sequences used for Real tree MHC-I exon 2.csv" "MHC-IIA exon 2 FastTree.newick" "MHC2A IQ-TREE.newick" "Sequences used for Real tree MHC-IIA exon 2.csv" "5speciesBirdTree.nex"

本数据集包含打包于压缩文件夹中的以下数据:PacBio HiFi 基因组原始测序读段(fastq格式)、Flye组装的重叠群(fasta格式)、由Phasebook生成的单倍型感知(haplotype-aware)重叠群(fasta格式),以及29号染色体上主要组织相容性复合体(Major Histocompatibility Complex, MHC)I类和II类区域的Hi-C染色体序列(长度96,746 bp,fasta格式)。此外还包含样本信息:涵盖扩增子ID、扩增子与等位基因深度、MHC基因型;同时收录了用于RNA测序的81个样本的数据,内容包括样本ID、采样日期、MHC RNA转录本、MHC基因型(若可获取),以及用于通过初步ID识别MHC等位基因名称的对照表。本数据集还提供了用于复现图1、2、3、4、S2、S4及S6至S11的R代码与配套实验数据。相关代码由Jamie Winternitz撰写,其联系邮箱为jcwinternitz-at-gmail-dot-com。 压缩包"Genomic Buteo buteo MHC data.zip"内的基因组数据如下: 1. PacBio HiFi原始读段:存放于"raw_HiFi.zip"中,包含文件"hi48toc29b.fq"、"hi64toc29b.fq"、"hi263toc29b.fq"、"hi326toc29b.fq" 2. Flye组装重叠群:文件名为"7 Flye contigs from Buteo buteo genome MHC region.fasta" 3. Phasebook生成的单倍型感知重叠群:存放于"haplotype-aware_contigs_phasebook.zip"中,包含文件"hi48toc29b_contigs_phasebook.fa"、"hi64toc29b_contigs_phasebook.fa"、"hi263toc29b_contigs_phasebook.fa"、"hi326toc29b_contigs_phasebook.fa" 4. 3个个体的29号染色体MHC区域Hi-C共识序列:文件名为"HiC_scaffold_33-Ch29 MHC region (96746 bp).fasta" 用于MHC I类、II类基因型及RNA转录本采样数据的文件包括:"individual_allele_report_MHCI_genbank_names.csv"、"individual_allele_report_MHCII_genbank_names.csv"、"Samples with MHC-Iex3 and IIBex2 RNA transcripts and amplicon genotypes.xlsx"、"MHC_1ex2&2Aexon2_transcripts_exact_match_transcriptomes_2018_9.xlsx" 各图表复现所需的R代码与配套数据如下: 1. 复现图1、S2及S4所需的R代码为"MHC allele freq and expression.R",配套数据包括:"Pop_allele_freq_RNAseq_data.csv"、"CRC_MHC_allele_per_indiv.csv" 2. 复现图2及S6所需的R代码为"multiple sequence alignment figures.R",配套数据包括:"Buteo buteo MHC1 and HLA-A and Haal-UA exon 3 translation alignment character vectors.csv"、"PBS_MHC-I_exon3.csv"、"Buteo buteo MHC2 and HLA-DRB and Buga-DRB translation alignment character vectors.csv"、"PBS_DRB1_exon2.csv"、"Buteo buteo MHC2 recomb segments removed and HLA-DRB and Buga-DRB translation alignment character vectors.csv"、"PBS_DRB1_exon2_no_recomb_segments.csv"、"Buteo buteo MHC-I exon 2 and human and chicken translation alignment character vectors.csv"、"PBS_MHC-I_exon2.csv"、"Buteo buteo MHCIIA exon 2 with chicken alignment character vectors.csv"、"PBS_MHC-IIA_exon2.csv" 3. 复现图3、S7至S10所需的R代码为"Phylogenetic figures.R",配套数据包括:"MHC1 FastTree.newick"、"MHC1 IQ-TREE.newick"、"MHC1 FastTree 110 sequences.csv"、"Common species dataset MHC1 and MHC2.csv"、"MHC2 FastTree.newick"、"MHC2 IQ-TREE.newick"、"MHC2 FastTree 65 sequences.csv"、"10speciesBirdTree.nex"、"hawk.png"、"owl.png"、"grouse.png"、"vulture.png" 4. 复现图4及S11所需的R代码为"phylogeny for MHC-Iex2 and MHC-IIAex2.R",配套数据包括:"MHC-I exon 2 FastTree.newick"、"MHC1ex2 IQ-TREE.newick"、"Sequences used for Real tree MHC-I exon 2.csv"、"MHC-IIA exon 2 FastTree.newick"、"MHC2A IQ-TREE.newick"、"Sequences used for Real tree MHC-IIA exon 2.csv"、"5speciesBirdTree.nex"
创建时间:
2023-04-27
二维码
社区交流群
二维码
科研交流群
商业服务