L. anisa cgMLST 3140 targets ref-seqs.fasta
收藏DataCite Commons2020-08-29 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/L_anisa_cgMLST_3140_targets_ref-seqs_fasta/6325457/1
下载链接
链接失效反馈官方服务:
资源简介:
cgMLST scheme was constructed using the genomes of three <i>L. anisa </i>strains downloaded from NCBI (RefSeq: NZ_CANP00000000.1; RefSeq: NZ_NBTX00000000.1; RefSeq: NZ_LNXS00000000.1), using Ridom SeqSphere+ cgMLST Target Definer with the following parameters: a minimum length filter that removes all genes smaller than 50 bp; a start codon filter that discards all genes that contain no start codon at the beginning of the gene; a stop codon filter that discards all genes that contain no stop codon or more than one stop codon or that do not have the stop codon at the end of the gene; a homologous gene filter that discards all genes with fragments that occur in multiple copies within a genome (with identity of 90% and >100 bp overlap); and a gene overlap filter that discards the shorter gene from the cgMLST scheme if the two genes affected overlap >4 bp. The remaining genes were then used in a pairwise comparison using BLAST version 2.2.12 (parameters used were word size 11, mismatch penalty −1, match reward 1, gap open costs 5, and gap extension costs 2). All genes of the reference genome that were common in all query genomes with a sequence identity of ≥90% and 100% overlap and, with the default parameter stop codon percentage filter turned on, formed the final cgMLST scheme
核心基因组多位点序列分型(core genome Multilocus Sequence Typing, cgMLST)方案构建流程如下:以从美国国家生物技术信息中心(National Center for Biotechnology Information, NCBI)参考序列数据库(Reference Sequence, RefSeq)下载的3株<italic>L. anisa</italic>菌株的基因组为研究材料,3株菌株的RefSeq登录号分别为NZ_CANP00000000.1、NZ_NBTX00000000.1、NZ_LNXS00000000.1,采用Ridom SeqSphere+ cgMLST靶点定义工具进行分析,设置的参数包括:
1. 最小长度过滤规则:剔除所有长度小于50 bp的基因;
2. 起始密码子过滤规则:剔除基因起始位点不含起始密码子的基因;
3. 终止密码子过滤规则:剔除无终止密码子、含多个终止密码子,或终止密码子未位于基因末端的基因;
4. 同源基因过滤规则:剔除基因组内存在多拷贝同源片段的基因(序列一致性≥90%且重叠区域≥100 bp);
5. 基因重叠过滤规则:当两个基因的重叠区域>4 bp时,保留长度更长的基因,剔除较短的基因。
随后,使用局部序列比对搜索工具(Basic Local Alignment Search Tool, BLAST)2.2.12版本对剩余基因进行两两比对,所用参数为:单词长度为11、错配罚分为-1、匹配得分为1、开放缺口代价为5、延伸缺口代价为2。最终,在所有查询基因组中均保守存在的参考基因组基因,在满足序列一致性≥90%、重叠区域为100%,且启用默认参数的终止密码子百分比过滤后,即构成最终的cgMLST方案。
提供机构:
figshare
创建时间:
2018-05-23



