TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_AT_STAR-SFT-letter_countdown_4o-eval_sft
收藏Hugging Face2025-11-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/TAUR-dev/D-EVAL__standard_eval_v3__FinEval_16k_fulleval_AT_STAR-SFT-letter_countdown_4o-eval_sft
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含问题、答案、任务配置、任务来源、提示信息、模型响应及其评估信息等字段。数据集适用于机器学习模型训练和评估,包含了测试集。每个样本可能包含多个模型响应及其评估指标,如正确性、抽取的答案、抽取和评估的元数据等。此外,还包含了模型响应的指标,如翻转次数、总翻转数、正确数、在n次尝试后通过、正确率、技能计数等。
The dataset includes fields for questions, answers, task configurations, task sources, prompt information, model responses, and their evaluation details. It is suitable for machine learning model training and evaluation, and includes a test set. Each sample may contain multiple model responses with their evaluation metrics such as correctness, extracted answers, extraction and evaluation metadata. Additionally, it includes metrics for model responses like number of flips, total flips, number of correct responses, pass at n attempts, percentage correct, and skill counts.
提供机构:
TAUR-dev



