five

Reference-based QUantification Of gene Dispensability (QUOD) - test dataset

收藏
pub.uni-bielefeld.de2023-07-19 更新2025-03-24 收录
下载链接:
https://pub.uni-bielefeld.de/record/2946079
下载链接
链接失效反馈
官方服务:
资源简介:
# Test set for QUOD (Reference-based QUantification Of gene Dispensability) ## Background: Dispensability of genes in a phylogenetic lineage, e.g. a species, genus, or higher-level clade, is gaining relevance as most genome sequencing projects move to a pangenome level. Most analyses classify genes as core genes, which are present in (almost) all investigated individual genomes, and dispensable genes, which only occur in a single or a few investigated genomes. The binary classification as ‘core’ or ‘dispensable’ is often based on arbitrary cutoffs of presence/absence in the analysed genomes. Instead of classifying a gene as core or dispensable, QUOD assigns a dispensability score to each gene. Hence, QUOD facilitates the identification of candidate dispensable genes which often underlie lineage-specific adaptation to varying environmental conditions. ## Test set: The test dataset for QUOD comprises genomic reads of four randomly selected accessions of the *Arabidopsis thaliana* Nordborg set. The reads were retrieved from the Sequence Read Archive (SRA) and mapped against the AthNd1_v2c reference genome sequence [1] using bowtie2 [2]. To reduce the size of the files, the first Mbp of Chr1 was extracted. All BAM files provided here are already sorted and should be used as input for QUOD which is available on GitHub: https://github.com/ksielemann/QUOD. A dispensability score is calculated for each gene. Optionally, the results can be visualized as a colored histogram and a box plot.<br /> <br /> ##### References: <sub>[1] Pucker B, et al. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PloS one 14.5 (2019): e0216233.</sub> <sub>[2] Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.</sub>

## 测试集描述 - 基因可废弃性量化参考(QUOD) ## 背景: 在系统发育谱系中,例如物种、属或更高级别的类群,基因的可废弃性正日益受到重视,因为大多数基因组测序项目已转向泛基因组水平。大多数分析将基因分类为核心基因,这些基因存在于(几乎)所有被研究的个体基因组中,以及可废弃基因,这些基因仅存在于单个或少数被研究的基因组中。将基因分类为‘核心’或‘可废弃’的二分法通常基于分析基因组中存在/缺失的任意阈值。QUOD并非将基因分类为核心或可废弃,而是为每个基因分配一个可废弃性得分。因此,QUOD有助于识别候选的可废弃基因,这些基因通常与谱系对环境条件变化的特定适应性有关。 ## 测试集: QUOD的测试数据集包括来自*Arabidopsis thaliana* Nordborg集合中四个随机选择的存取号的基因组读数。这些读数从序列读数档案(SRA)中检索,并使用bowtie2 [2]与AthNd1_v2c参考基因组序列[1]进行比对。为减小文件大小,提取了第1号染色体的前Mbp。所有提供的BAM文件已排序,应作为输入用于QUOD,QUOD可在GitHub上获取:https://github.com/ksielemann/QUOD。 为每个基因计算可废弃性得分。可选地,结果可以可视化成彩色直方图和箱线图。 ##### 参考文献: <sub>[1] Pucker B, et al. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PloS one 14.5 (2019): e0216233.</sub> <sub>[2] Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.</sub>
提供机构:
pub.uni-bielefeld.de
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作