five

Evolution of the carbohydrate 6-O sulfotransferase (C6OST) family in vertebrates and report of CHST16, a previously unrecognized C6OST gene lost from amniotes

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://figshare.com/articles/Evolution_of_the_carbohydrate_6-O_sulfotransferase_C6OST_family_in_vertebrates_and_report_of_CHST16_a_previously_unrecognized_C6OST_gene_lost_from_amniotes/9596285/4
下载链接
链接失效反馈
官方服务:
资源简介:
Master_table_C6OST.xlsx: Complete table of all identified C6OST sequences. Includes sequence names, chromosomal locations, database identifiers and annotation notes of all C6OST sequences identified in this study. Also includes complete list of species, species abbreviations and genome assemblies used in the study. Sequence names include species abbreviations followed by chromosome/linkage group designations (if available) and gene symbols. Asterisks indicate incomplete/partial sequences. Paralogs (within-species duplicates) with uncertain phylogenetic relationships are designated as (1of2), (2of2) et c. We have followed the phylogeny and classification of birds suggested by Prum et al. (2015) Nature 526:569–573 doi: 10.1038/nature15697, and of teleost fishes suggested by Near et al. (2012) PNAS 109:13698–703 doi: 10.1073/pnas.1206625109 and Betancur-R et al. (2017) BMC Evol. Biol. 17:162. doi: 10.1186/s12862-017-0958-3. For invertebrate species, sequences were also sought using the profile-Hidden Markov Model search tool HMMER (hmmer.org) aimed at reference proteomes. Master_C6OST_all.rtf: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. Rich Test Format file marking exon junctions in alternating colors. Master_C6OST_all.fasta: All identified C6OST sequences in FASTA format, in the same order as in Master_table_C6OST.xlsx. FASTA format file for alignment/sequence viewing applications. Ident_C6OST_seq.txt: List of identical C6OST sequences in this dataset. Short_unused_C6OST_seq.txt: List of partial C6OST sequences in this dataset that are shorter than 50% of final alignments. These were not used in phylogenies. 190121_C6OST_full_align.fasta: Alignment including the full repertoire of C6OST sequences (CHST1, CHST2, CHST3, CHST4, CHST5, CHST6, CHST7, CHST16 and related genes) in a smaller set of species. 190121_C6OST_full_IQ-TREE.tar.gz: Phylogenetic analysis (IQ-TREE output files) for the full repertoire of C6OST sequences in a smaller set of species. This analysis corresponds to Figures 1-5 in the publication. The file 190305_CHST7_align.fasta.treefile includes the phylogenetic tree in Newick format. The following files correspond to the alignments and phylogenetic analyses for each of the C6OST subfamilies with the full representation of species. These analyses correspond to Supplementary Figures S1-S8 in the publication. Within the IQ-TREE output files, the files ending on .treefile include the phylogenetic trees in Newick format. 190130_CHST1_align.fasta 190130_CHST1_IQ-TREE.tar.gz190130_CHST2_align.fasta 190130_CHST2_IQ-TREE.tar.gz 190130_CHST3_align.fasta 190130_CHST3_IQ-TREE.tar.gz190130_CHST4-5-6_align.fasta 190130_CHST4-5-6_IQ-TREE.tar.gz190130_CHST16_align.fasta 190130_CHST16_IQ-TREE.tar.gz190305_CHST7_align.fasta 190305_CHST7_IQ-TREE.tar.gz Conserved_synteny_gene_lists_Ens83.xlsx: Lists of genes from the C6OST gene-bearing chromosome regions in the human, Carolina anole lizard, spotted gar and zebrafish genomes. Lists are arranged by Ensembl protein family predictions (Ensembl version 83) and number of times each protein family is represented on C6OST-bearing chromosome regions (column named '#'). Conserved_synteny_data.xlsx: Chromosomal/conserved synteny data. Includes chromosomal locations and database identifiers of all C6OST-neighboring genes identified in the study. New gene symbols/names suggested by this study are highlighted in yellow. This file also includes all identified conserved synteny blocks in the human, chicken, Western clawed frog, spotted gar, zebrafish and medaka genomes. Channel_catfish_CHST1a_region.xlsx: Genes neighboring CHST1a in the channel catfish genome. Used to identify the orthologous region of the zebrafish genome, which lacks a CHST1a gene. Anolis_Chr2_conserved_synteny.xlsx: Genes neighboring the "CHST4/5-like" gene on Carolina anole lizard chromosome 2. Used to identify the orthologous regions of the human and spotted gar genomes. Inshore_hagfish_cons_synteny.xlsx: Genes neighboring the inshore hagfish (Eptatretus burgeri) C6OST genes. Used to infer orthology between jawless vertebrate and jawed vertebrate C6OST genes.

Master_table_C6OST.xlsx:本研究鉴定到的全部C6OST序列完整汇总表。表格收录本研究中所有已鉴定C6OST序列的序列名称、染色体定位、数据库标识符及注释说明,同时包含本研究使用的全部物种列表、物种缩写及基因组组装版本。序列名称格式为:物种缩写 + 染色体/连锁群编号(若有)+ 基因符号。星号(*)代表序列不完整或为截短序列。系统发育关系尚不明确的种内重复旁系同源基因,将以(1of2)、(2of2)等形式标注。本研究遵循Prum等人2015年发表于《Nature》第526卷第569-573页(DOI: "10.1038/nature15697")提出的鸟类系统发育与分类体系,以及Near等人2012年发表于《PNAS》第109卷第13698-13703页(DOI: "10.1073/pnas.1206625109")、Betancur-R等人2017年发表于《BMC Evol. Biol.》第17卷第162页(DOI: "10.1186/s12862-017-0958-3")提出的硬骨鱼类系统发育与分类体系。针对无脊椎动物物种,本研究通过轮廓隐马尔可夫模型(profile-Hidden Markov Model)搜索工具HMMER(hmmer.org)对参考蛋白质组进行序列检索。 Master_C6OST_all.rtf:全部已鉴定C6OST序列的富文本格式(Rich Text Format,RTF)文件,序列顺序与Master_table_C6OST.xlsx保持一致,该文件以交替配色标记外显子剪接位点。 Master_C6OST_all.fasta:全部已鉴定C6OST序列的FASTA格式文件,序列顺序与Master_table_C6OST.xlsx保持一致,适用于序列比对/序列查看类应用。 Ident_C6OST_seq.txt:本数据集内序列完全一致的C6OST序列列表。 Short_unused_C6OST_seq.txt:本数据集内截短至最终比对序列长度50%以下的部分C6OST序列列表,此类序列未用于系统发育分析。 190121_C6OST_full_align.fasta:针对少量模式物种的全部C6OST序列(包括CHST1、CHST2、CHST3、CHST4、CHST5、CHST6、CHST7、CHST16及相关基因)的序列比对文件。 190121_C6OST_full_IQ-TREE.tar.gz:针对少量模式物种的全部C6OST序列的系统发育分析文件(IQ-TREE输出结果),对应论文中的图1至图5。其中190305_CHST7_align.fasta.treefile文件包含Newick格式的系统发育树。 以下文件对应各C6OST亚家族的序列比对与系统发育分析,涵盖所有物种的完整数据,对应论文中的补充图S1至S8。所有IQ-TREE输出文件中,后缀为.treefile的文件均包含Newick格式的系统发育树: 190130_CHST1_align.fasta 190130_CHST1_IQ-TREE.tar.gz 190130_CHST2_align.fasta 190130_CHST2_IQ-TREE.tar.gz 190130_CHST3_align.fasta 190130_CHST3_IQ-TREE.tar.gz 190130_CHST4-5-6_align.fasta 190130_CHST4-5-6_IQ-TREE.tar.gz 190130_CHST16_align.fasta 190130_CHST16_IQ-TREE.tar.gz 190305_CHST7_align.fasta 190305_CHST7_IQ-TREE.tar.gz Conserved_synteny_gene_lists_Ens83.xlsx:人类、卡罗莱纳安乐蜥、斑点雀鳝及斑马鱼基因组中携带C6OST基因的染色体区域的基因列表。列表按照Ensembl蛋白质家族预测结果(Ensembl版本83)及每个蛋白质家族在C6OST携带染色体区域的出现次数(列名为"#")进行排序。 Conserved_synteny_data.xlsx:染色体/保守共线性数据文件,包含本研究鉴定到的所有C6OST邻近基因的染色体定位及数据库标识符。本研究提出的新基因符号/名称以黄色高亮标注。该文件同时收录人类、鸡、西方爪蟾、斑点雀鳝、斑马鱼及青鳉基因组中所有鉴定到的保守共线性区块。 Channel_catfish_CHST1a_region.xlsx:斑点叉尾鮰基因组中CHST1a基因的邻近基因列表,用于定位斑马鱼基因组中缺失CHST1a基因的同源区域。 Anolis_Chr2_conserved_synteny.xlsx:卡罗莱纳安乐蜥2号染色体上“CHST4/5-like”基因的邻近基因列表,用于定位人类及斑点雀鳝基因组的同源区域。 Inshore_hagfish_cons_synteny.xlsx:黏盲鳗(Eptatretus burgeri)C6OST基因的邻近基因列表,用于推断无颌脊椎动物与有颌脊椎动物C6OST基因的同源关系。
创建时间:
2024-01-31
二维码
社区交流群
二维码
科研交流群
商业服务