five

FileS4_CurationRemovedSeqs.xlsx

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/FileS4_CurationRemovedSeqs_xlsx/28179515
下载链接
链接失效反馈
官方服务:
资源简介:
Sequences removed during curation. These sequences are found in the pre-curation sequence and tree data (Supplementary files S8-S9) but were removed during curation, as described in the methods section. Under the “reason removed” columns, “Low coverage compositional outlier” indicates sequences with a significantly outlying Malahonobis distance in the GC3S versus ENc distribution and with k-mer coverage <10 or with a more highly covered paralogous sequence in the same clade; “Identical pair with differential coverage” indicates sequences at least 95% identical at the nucleotide level over at least 67% of their length with a sequence from another taxon that is more highly covered, as long as the k-mer coverage of the more lowly covered sequence is below a taxon-dependent threshold as determined by manual inspection of the data (50 for Bolivina and Nonionella, 20 for Hippocrepinella hirudina and Psammophaga fuegia, and 100 for Ammodiscus and sample Mf03 (Milliammina), and other wise 10); “Lowly covered paralog” indicates sequences with a more highly covered paralog in the same clade that covers at least 80% of the alignment length for the clade; “Sister genus and lab of origin” indicates sequences from taxa for which we have more than one sample available directly sister in the rebuilt clade tree to a taxon of a different genus (and not sister to any taxa of the same genus), as long as all the sister samples originate from the same lab as the sequence to be removed.
创建时间:
2025-01-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作