10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks, under varying experimental setup settings. Hist-8000 matches SoT in six out of seven tasks (tasks 1, 3-8 considered here), after thorough comparisons of the representation methods under different classifier models and their hyper-parameters. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset of more than 500k proteins [17]. Best-performing representations in bold, with the classifier used for the top representation method provided in separate column. See section ‘Protein inference problems’ for task data
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/10-fold_Cross-Validation_Area-Under-the-Curve_scores_mean_st_dev_of_protein_representation_methods_in_the_inference_tasks_under_varying_experimental_setup_settings_Hist-8000_matches_SoT_in_six_out_of_seven_tasks_tasks_1_3-8_considered_here_/29845905
下载链接
链接失效反馈官方服务:
资源简介:
10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks, under varying experimental setup settings. Hist-8000 matches SoT in six out of seven tasks (tasks 1, 3-8 considered here), after thorough comparisons of the representation methods under different classifier models and their hyper-parameters. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset of more than 500k proteins [17]. Best-performing representations in bold, with the classifier used for the top representation method provided in separate column. See section ‘Protein inference problems’ for task data
创建时间:
2025-08-06



