10-Fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in five protein family inference tasks. Hist-8000 (results in bold) outperforms SoT in identifying proteins from all the 25 families tested here from the ProtVec study [17]. We note that for each set of protein sequences belonging to a family to be predicted, we randomly sampled the same number of sequences from other families from Swiss-Prot [41] to form the negative data for a balanced task dataset. For brevity we are showing the results for the top-5 families by number of proteins, see S1 File (supporting information) section ‘S
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/10-Fold_Cross-Validation_Area-Under-the-Curve_scores_mean_st_dev_of_protein_representation_methods_in_five_protein_family_inference_tasks_Hist-8000_results_in_bold_outperforms_SoT_in_identifying_proteins_from_all_the_25_families_tested_here/29845896
下载链接
链接失效反馈官方服务:
资源简介:
10-Fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of protein representation methods in five protein family inference tasks. Hist-8000 (results in bold) outperforms SoT in identifying proteins from all the 25 families tested here from the ProtVec study [17]. We note that for each set of protein sequences belonging to a family to be predicted, we randomly sampled the same number of sequences from other families from Swiss-Prot [41] to form the negative data for a balanced task dataset. For brevity we are showing the results for the top-5 families by number of proteins, see S1 File (supporting information) section ‘S
创建时间:
2025-08-06



