TAUR-dev/D-EVAL__standard_eval_v3__FE_8k_ours_cd5arg-eval_rl
收藏Hugging Face2025-10-19 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/TAUR-dev/D-EVAL__standard_eval_v3__FE_8k_ours_cd5arg-eval_rl
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了问题、答案以及与任务相关的配置信息。每个样本可能包含一个或多个提示,这些提示有内容字段和角色字段。此外,还包括模型的响应和相关评估信息,如正确性、最佳答案选择、答案提取和评估的元数据等。数据集还提供了性能指标,如正确率、翻转次数等。数据集分为测试集,包含了1000个样本。
The dataset includes questions, answers, and task-related configurations. Each sample may contain one or more prompts, which have content and role fields. Additionally, it includes model responses and related evaluation information such as correctness, best answer selection, answer extraction, and evaluation metadata. The dataset also provides performance metrics such as accuracy, number of flips, etc. The dataset is split into a test set, containing 1000 samples.
提供机构:
TAUR-dev



