five

cpn60-Classifier v10.1 (Performance testing)

收藏
Mendeley Data2024-01-31 更新2024-06-27 收录
下载链接:
https://figshare.com/articles/dataset/cpn60-Classifier_v10_1_Performance_testing_/21972278/1
下载链接
链接失效反馈
官方服务:
资源简介:
cpn60-Classifier v10.1 (For additional information and releases, visit HillLab on github) This is the version of the RDP Classifier trained on 11,001 reference cpn60 sequences used for performance testing. Duplicate sequences were removed from the reference database using the rm-dupseq function of the RDP classifier since these can inflate results during classification performance testing. (An updated release containing additional sequences has been made available since this original investigation) The release contains training files (taxonomy table and FASTA formatted sequences) as well as the trained classifier for use with RDP Classifier. RDPTools includes the classifier and can be installed with conda https://anaconda.org/bioconda/rdptools (Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261–7). Quick start with the trained classifier Download cpn60-Classifier_v10_trained.tar.gz and unpack it. The resulting directory should include: bergeyTrainingTree.xml genus_wordConditionalProbList.txt logWordPrior.txt rRNAClassifier.properties wordConditionalProbIndexArr.txt A basic command to classify cpn60 sequences contained in a file called queries.fasta: java -jar /path/to/RDPTools/classifier.jar classify -c 0.9 -f allrank -t /path/to/cpn60-Classifier_v10_trained/rRNAClassifier.properties -o output.txt queries.fasta See the README here for more details on the RDP Classifier: https://github.com/rdpstaff/classifier To train the Classifier Download cpn60-Classifier_v10_training.tar.gz and unpack it. The resulting directory should include: refseqs_v10.fasta taxonomytable_v10.txt Other scripts needed (from https://github.com/GLBRC-TeamMicrobiome/python_scripts with minor edit to addFullLineage.py to fix error): addFullLineage-jh.py lineage2taxTrain.py (If you want to generate your own taxonomy file, see https://pypi.org/project/taxonomy-ranks/) Make ready-to-train taxonomy: /path/to/lineage2taxTrain.py taxonomytable_v10.txt > ready2train_taxonomy.txt Add lineages to fasta sequence definition lines: /path/to/addFullLineage-jh.py taxonomytable_v10.txt resets_v10.fasta > ready2train_refseqs.fasta Now train: java -jar /path/to/RDPTools/classifier.jar train -o training_files -s read2train_refseqs.fasta -t ready2train_taxonomy.txt The resulting directory contains the trained classifier EXCEPT for one important thing, which is the rRNAClassifier.properties file, which you can add manually.
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作