Dataset: Structural, functional and evolutionary characterisation of genes in Lactuca sp. reference genomes in the context of eudicots
收藏4TU.ResearchData2024-12-10 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/af1b751d-23a2-4954-ac01-4eb6c68d895b/1
下载链接
链接失效反馈官方服务:
资源简介:
This data set contains easy-to-use overviews of the location, function and homologs of each transcript in the reference genomes of three <em>Lactuca</em> sp. For <em>L. sativa</em>, we included both v8 and v11 genomes of cultivar Salinas since both are used in lettuce research. For the <em>L. sativa</em> v11 genome specifically, we added the submitted structural annotation to the RefSeq structural annotation where there was no overlap with the latter (resulting GFF3 file is part of this data set). For <em>L. saligna</em> and <em>L. virosa</em>, we included their respective reference genomes according to NCBI (<em>dd.</em> 25 September 2024). For the structural information, we parsed the GFF3 file of each genome annotation; for the functional annotations, we obtained protein sequences and functionally annotated them using InterProScan; for the homologs, we constructed a panproteome using a diverse set of eudicots and grouped the proteins in homology groups using PanTools. All data has been collected in TSV files, which can be used in Excel, R and command-line applications. For technical details, please refer to the included README.
本数据集涵盖三种莴苣属(*Lactuca*)物种参考基因组中各转录本的位置、功能及同源基因的易用性概览。针对栽培种Salinas的生菜(*L. sativa*),我们同时纳入其v8与v11版本的参考基因组——二者均广泛应用于生菜研究领域。针对*L. sativa* v11版本基因组,我们将提交的结构注释添加至与其无重叠的RefSeq结构注释中,生成的GFF3文件已包含于本数据集内。对于*L. saligna*和*L. virosa*,我们采用NCBI于2024年9月25日发布的对应参考基因组。关于结构注释信息,我们解析了各基因组注释对应的GFF3文件;功能注释环节,我们获取蛋白质序列并通过InterProScan完成功能注释;同源基因分析则通过选取多样化的真双子叶植物类群构建泛蛋白质组,并借助PanTools将蛋白质归类至同源基因家族。所有数据均以TSV格式存储,可直接在Excel、R及命令行应用程序中调用。如需了解技术细节,请参阅随附的README文件。
创建时间:
2024-12-10



