10-Fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks. Hist-8000 outperforms SoT in seven out of tasks 1-8. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset [17]. Best-performing methods in bold. See section ‘Protein inference problems’ for task data sources. Gram-negative, Gram-positive and Archaea datasets are each used for subcellular localisation prediction from protein sequence. Gram-neg.: Gram-negative bacteria, Gram-pos.: Gram-positive bacteria, #Proteins (pos.+neg.): number of proteins (positive+negative), Hist-8000: Histogram-8000, BoW: Bag-of-Word
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/10-Fold_Cross-Validation_Area-Under-the-Curve_scores_mean_st_dev_of_protein_representation_methods_in_the_inference_tasks_Hist-8000_outperforms_SoT_in_seven_out_of_tasks_1-8_Hist-8000_consists_of_the_conceptually_simpler_BoW_approach_13_in_/29845893
下载链接
链接失效反馈官方服务:
资源简介:
10-Fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks. Hist-8000 outperforms SoT in seven out of tasks 1-8. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset [17]. Best-performing methods in bold. See section ‘Protein inference problems’ for task data sources. Gram-negative, Gram-positive and Archaea datasets are each used for subcellular localisation prediction from protein sequence. Gram-neg.: Gram-negative bacteria, Gram-pos.: Gram-positive bacteria, #Proteins (pos.+neg.): number of proteins (positive+negative), Hist-8000: Histogram-8000, BoW: Bag-of-Word
创建时间:
2025-08-06



