10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks, under varying experimental setup settings. Hist-8000 matches SoT in six out of seven tasks (tasks 1, 3-8 considered here), after thorough comparisons of the representation methods under different classifier models and their hyper-parameters. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset of more than 500k proteins [17]. Best-performing representations in bold, with the classifier used for the top representation method provided in separate column. See section ‘Protein inference problems’ for task data

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/10-fold_Cross-Validation_Area-Under-the-Curve_scores_mean_st_dev_of_protein_representation_methods_in_the_inference_tasks_under_varying_experimental_setup_settings_Hist-8000_matches_SoT_in_six_out_of_seven_tasks_tasks_1_3-8_considered_here_/29845905

下载链接

链接失效反馈

官方服务：

资源简介：

10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in the inference tasks, under varying experimental setup settings. Hist-8000 matches SoT in six out of seven tasks (tasks 1, 3-8 considered here), after thorough comparisons of the representation methods under different classifier models and their hyper-parameters. Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to SoT which requires SSL pre-training (word2vec) on a large protein sequence dataset of more than 500k proteins [17]. Best-performing representations in bold, with the classifier used for the top representation method provided in separate column. See section ‘Protein inference problems’ for task data

创建时间：

2025-08-06