five

F1 and calibrated log likelihood. Results are averaged over 10 random seeds; standard deviation is given in the subscript. Tasks marked by * are subject to input data distribution shift while datasets marked by † are subject to annotator pool distribution shift. Methods marked by ‡ are those which estimate either worker skill or item difficulty. Aggregating the individual soft-labeling methods yields classifiers with consistently good uncertainty estimation (best on all text based tasks) and generally good raw performance in terms of F1 across tasks.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/F1_and_calibrated_log_likelihood_Results_are_averaged_over_10_random_seeds_standard_deviation_is_given_in_the_subscript_Tasks_marked_by_are_subject_to_input_data_distribution_shift_while_datasets_marked_by_are_subject_to_annotator_pool_dist/29271495
下载链接
链接失效反馈
官方服务:
资源简介:
F1 and calibrated log likelihood. Results are averaged over 10 random seeds; standard deviation is given in the subscript. Tasks marked by * are subject to input data distribution shift while datasets marked by † are subject to annotator pool distribution shift. Methods marked by ‡ are those which estimate either worker skill or item difficulty. Aggregating the individual soft-labeling methods yields classifiers with consistently good uncertainty estimation (best on all text based tasks) and generally good raw performance in terms of F1 across tasks.
创建时间:
2025-06-09
二维码
社区交流群
二维码
科研交流群
商业服务