five

Reconstruction of the Carbohydrate 6-O Sulfotransferase Gene Family Evolution in Vertebrates Reveals Novel Member, CHST16, Lost in Amniotes

收藏
Figshare2019-08-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Evolution_of_the_carbohydrate_6-O_sulfotransferase_C6OST_family_in_vertebrates_and_report_of_CHST16_a_previously_unrecognized_C6OST_gene_lost_from_amniotes/9596285
下载链接
链接失效反馈
官方服务:
资源简介:
Master_table_C6OST.xlsx: Complete table of all identified C6OST sequences. Includes sequence names, chromosomal locations, database identifiers and annotation notes of all C6OST sequences identified in this study. Also includes complete list of species, species abbreviations and genome assemblies used in the study.Sequence names include species abbreviations followed by chromosome/linkage group designations (if available) and gene symbols. Asterisks indicate incomplete/partial sequences. Paralogs (within-species duplicates) with uncertain phylogenetic relationships are designated as (1of2), (2of2) et c.We have followed the phylogeny and classification of birds suggested by Prum et al. (2015) Nature 526:569–573 doi: 10.1038/nature15697, and of teleost fishes suggested by Near et al. (2012) PNAS 109:13698–703 doi: 10.1073/pnas.1206625109 and Betancur-R et al. (2017) BMC Evol. Biol. 17:162. doi: 10.1186/s12862-017-0958-3.For invertebrate species, sequences were also sought using the profile-Hidden Markov Model search tool HMMER (hmmer.org) aimed at reference proteomes.Master_C6OST_all.rtf: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. Rich Test Format file marking exon junctions in alternating colors.Master_C6OST_all.fasta: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. FASTA format file for alignment/sequence viewing applications.Ident_C6OST_seq.txt: List of identical C6OST sequences in this dataset.Short_unused_C6OST_seq.txt: List of partial C6OST sequences in this dataset that are shorter than 50% of final alignments. These were not used in phylogenies.190121_C6OST_full_align.fasta: Alignment including the full repertoire of C6OST sequences (CHST1, CHST2, CHST3, CHST4, CHST5, CHST6, CHST7, CHST16 and related genes) in a smaller set of species. 190121_C6OST_full_IQ-TREE.tar.gz: Phylogenetic analysis (IQ-TREE output files) for the full repertoire of C6OST sequences in a smaller set of species. This analysis corresponds to Figures 1-5 in the publication. The file 190305_CHST7_align.fasta.treefile includes the phylogenetic tree in Newick format.The following files correspond to the alignments and phylogenetic analyses for each of the C6OST subfamilies with the full representation of species. These analyses correspond to Supplementary Figures S1-S8 in the publication. Within the IQ-TREE output files, the files ending on .treefile include the phylogenetic trees in Newick format.190130_CHST1_align.fasta 190130_CHST1_IQ-TREE.tar.gz190130_CHST2_align.fasta190130_CHST2_IQ-TREE.tar.gz190130_CHST3_align.fasta190130_CHST3_IQ-TREE.tar.gz190130_CHST4-5-6_align.fasta190130_CHST4-5-6_IQ-TREE.tar.gz190130_CHST16_align.fasta190130_CHST16_IQ-TREE.tar.gz190305_CHST7_align.fasta190305_CHST7_IQ-TREE.tar.gzConserved_synteny_gene_lists_Ens83.xlsx: Lists of genes from the C6OST gene-bearing chromosome regions in the human, Carolina anole lizard, spotted gar and zebrafish genomes. Lists are arranged by Ensembl protein family predictions (Ensembl version 83) and number of times each protein family is represented on C6OST-bearing chromosome regions (column named '#'). Conserved_synteny_data.xlsx: Chromosomal/conserved synteny data. Includes chromosomal locations and database identifiers of all C6OST-neighboring genes identified in the study. New gene symbols/names suggested by this study are highlighted in yellow. This file also includes all identified conserved synteny blocks in the human, chicken, Western clawed frog, spotted gar, zebrafish and medaka genomes.Channel_catfish_CHST1a_region.xlsx: Genes neighboring CHST1a in the channel catfish genome. Used to identify the orthologous region of the zebrafish genome, which lacks a CHST1a gene.Anolis_Chr2_conserved_synteny.xlsx: Genes neighboring the "CHST4/5-like" gene on Carolina anole lizard chromosome 2. Used to identify the orthologous regions of the human and spotted gar genomes.Inshore_hagfish_cons_synteny.xlsx: Genes neighboring the inshore hagfish (Eptatretus burgeri) C6OST genes. Used to infer orthology between jawless vertebrate and jawed vertebrate C6OST genes.

Master_table_C6OST.xlsx:本研究中鉴定到的所有C6OST序列的完整统计表。包含本研究鉴定到的全部C6OST序列的序列名称、染色体位置、数据库标识符及注释说明。同时收录了本研究使用的所有物种、物种缩写及基因组组装版本的完整列表。序列名称由物种缩写、染色体/连锁群编号(若有)及基因符号组成。星号(*)代表序列不完整或为截短序列。系统发育关系尚不明确的旁系同源基因(种内重复序列)将标注为(1of2)、(2of2)等格式。本研究遵循Prum等人(2015)发表于《Nature》526:569–573(doi: 10.1038/nature15697)的鸟类系统发育与分类体系,以及Near等人(2012)发表于《PNAS》109:13698–703(doi: 10.1073/pnas.1206625109)与Betancur-R等人(2017)发表于《BMC Evol. Biol.》17:162(doi: 10.1186/s12862-017-0958-3)的硬骨鱼系统发育与分类体系。对于无脊椎动物物种,本研究通过隐马尔可夫模型轮廓搜索工具HMMER(hmmer.org)针对参考蛋白质组进行序列检索。 Master_C6OST_all.rtf:以FASTA格式(FASTA)存储的全部已鉴定C6OST序列,序列顺序与Master_table_C6OST.xlsx完全一致。该富文本格式文件以交替颜色标注了外显子剪接位点。 Master_C6OST_all.fasta:以FASTA格式(FASTA)存储的全部已鉴定C6OST序列,序列顺序与Master_table_C6OST.xlsx完全一致,适用于序列比对或序列可视化工具。 Ident_C6OST_seq.txt:本数据集中完全一致的C6OST序列列表。 Short_unused_C6OST_seq.txt:本数据集中长度不足最终比对序列50%的截短C6OST序列列表,此类序列未用于后续系统发育分析。 190121_C6OST_full_align.fasta:包含少量模式物种中全部C6OST序列家族(CHST1、CHST2、CHST3、CHST4、CHST5、CHST6、CHST7、CHST16及相关基因)的序列比对文件。 190121_C6OST_full_IQ-TREE.tar.gz:针对少量模式物种中全部C6OST序列家族的系统发育分析结果(IQ-TREE输出文件),对应论文中的图1至图5。其中190305_CHST7_align.fasta.treefile文件包含Newick格式(Newick)的系统发育树。 以下文件对应各C6OST亚家族的序列比对与系统发育分析,覆盖全部研究物种,对应论文中的补充图S1至S8。在IQ-TREE输出文件中,后缀为.treefile的文件包含Newick格式的系统发育树: 190130_CHST1_align.fasta、190130_CHST1_IQ-TREE.tar.gz 190130_CHST2_align.fasta、190130_CHST2_IQ-TREE.tar.gz 190130_CHST3_align.fasta、190130_CHST3_IQ-TREE.tar.gz 190130_CHST4-5-6_align.fasta、190130_CHST4-5-6_IQ-TREE.tar.gz 190130_CHST16_align.fasta、190130_CHST16_IQ-TREE.tar.gz 190305_CHST7_align.fasta、190305_CHST7_IQ-TREE.tar.gz Conserved_synteny_gene_lists_Ens83.xlsx:人类、卡罗莱纳安乐蜥、斑点雀鳝及斑马鱼基因组中携带C6OST基因的染色体区域的基因列表。列表按照Ensembl(Ensembl)蛋白质家族预测结果(Ensembl版本83)及各蛋白质家族在C6OST携带染色体区域的出现次数(名为"#"的列)进行排序。 Conserved_synteny_data.xlsx:染色体保守同线性数据。包含本研究鉴定到的所有C6OST邻近基因的染色体位置与数据库标识符。本研究提出的新基因符号/名称以黄色高亮标注。该文件同时收录了人类、鸡、西部爪蟾、斑点雀鳝、斑马鱼及青鳉基因组中的所有已鉴定保守同线性区块。 Channel_catfish_CHST1a_region.xlsx:斑点叉尾鮰基因组中CHST1a基因的邻近基因列表,用于定位斑马鱼基因组中缺失CHST1a基因的同源区域。 Anolis_Chr2_conserved_synteny.xlsx:卡罗莱纳安乐蜥2号染色体上"CHST4/5-like"基因的邻近基因列表,用于定位人类与斑点雀鳝基因组中的同源区域。 Inshore_hagfish_cons_synteny.xlsx:近岸盲鳗(Eptatretus burgeri)C6OST基因的邻近基因列表,用于推断无颌类与有颌类C6OST基因的同源关系。
创建时间:
2019-08-13
二维码
社区交流群
二维码
科研交流群
商业服务