F1 and calibrated log likelihood. Results are averaged over 10 random seeds; standard deviation is given in the subscript. Tasks marked by * are subject to input data distribution shift while datasets marked by † are subject to annotator pool distribution shift. Methods marked by ‡ are those which estimate either worker skill or item difficulty. Aggregating the individual soft-labeling methods yields classifiers with consistently good uncertainty estimation (best on all text based tasks) and generally good raw performance in terms of F1 across tasks.

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/F1_and_calibrated_log_likelihood_Results_are_averaged_over_10_random_seeds_standard_deviation_is_given_in_the_subscript_Tasks_marked_by_are_subject_to_input_data_distribution_shift_while_datasets_marked_by_are_subject_to_annotator_pool_dist/29271495

下载链接

链接失效反馈

官方服务：

资源简介：

F1 and calibrated log likelihood. Results are averaged over 10 random seeds; standard deviation is given in the subscript. Tasks marked by * are subject to input data distribution shift while datasets marked by † are subject to annotator pool distribution shift. Methods marked by ‡ are those which estimate either worker skill or item difficulty. Aggregating the individual soft-labeling methods yields classifiers with consistently good uncertainty estimation (best on all text based tasks) and generally good raw performance in terms of F1 across tasks.

创建时间：

2025-06-09