cpn60-Classifier v10.1 (Performance testing)
收藏Figshare2023-04-26 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/cpn60-Classifier_v10_1_Performance_testing_/21972278/1
下载链接
链接失效反馈官方服务:
资源简介:
cpn60-Classifier v10.1 (For additional information and releases, visit HillLab on github) <br> This is the version of the RDP Classifier trained on 11,001 reference cpn60 sequences used for performance testing. Duplicate sequences were removed from the reference database using the <em>rm-dupseq</em> function of the RDP classifier since these can inflate results during classification performance testing. <br> (An updated release containing additional sequences has been made available since this original investigation) <br> The release contains <strong>training files</strong> (taxonomy table and FASTA formatted sequences) as well as the <strong>trained classifier </strong>for use with RDP Classifier. <br> <strong>RDPTools</strong> includes the classifier and can be installed with conda https://anaconda.org/bioconda/rdptools (Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261–7). Quick start with the trained classifier Download cpn60-Classifier_v10_trained.tar.gz and unpack it. The resulting directory should include: bergeyTrainingTree.xml genus_wordConditionalProbList.txt logWordPrior.txt rRNAClassifier.properties wordConditionalProbIndexArr.txt A basic command to classify cpn60 sequences contained in a file called queries.fasta: <br> <code>java -jar /path/to/RDPTools/classifier.jar classify -c 0.9 -f allrank -t /path/to/cpn60-Classifier_v10_trained/rRNAClassifier.properties -o output.txt queries.fasta</code> <br> See the README here for more details on the RDP Classifier: https://github.com/rdpstaff/classifier To train the Classifier Download cpn60-Classifier_v10_training.tar.gz and unpack it. The resulting directory should include: refseqs_v10.fasta taxonomytable_v10.txt Other scripts needed (from https://github.com/GLBRC-TeamMicrobiome/python_scripts with minor edit to addFullLineage.py to fix error): addFullLineage-jh.py lineage2taxTrain.py (If you want to generate your own taxonomy file, see https://pypi.org/project/taxonomy-ranks/) <br> Make ready-to-train taxonomy: <br> <code>/path/to/lineage2taxTrain.py taxonomytable_v10.txt > ready2train_taxonomy.txt</code> <br> Add lineages to fasta sequence definition lines: <br> <code>/path/to/addFullLineage-jh.py taxonomytable_v10.txt resets_v10.fasta > ready2train_refseqs.fasta</code> <br> Now train: <br> <code>java -jar /path/to/RDPTools/classifier.jar train -o training_files -s read2train_refseqs.fasta -t ready2train_taxonomy.txt</code> <br> The resulting directory contains the trained classifier EXCEPT for one important thing, which is the rRNAClassifier.properties file, which you can add manually.
提供机构:
Hill, Janet
创建时间:
2023-04-26



