studymakesmehappyyyyy/TruthfulQA
收藏Hugging Face2024-12-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/studymakesmehappyyyyy/TruthfulQA
下载链接
链接失效反馈官方服务:
资源简介:
TruthfulQA基准测试包括两个任务:生成类任务和选择题任务。生成类任务要求模型根据给定问题生成1-2句答案,评估目标为答案的真实性和信息量,使用微调的GPT-3模型和传统相似性指标进行评估。选择题任务测试模型在多个选项中选择唯一正确答案的能力,评估指标包括准确率和归一化概率分布下对正确答案的总概率。该基准测试用于评估模型在真实度和信息量上的表现,以及在多选题任务上的准确率。
The TruthfulQA benchmark is designed to evaluate the performance of the Qwen2.5 model in terms of the truthfulness and informativeness of generated answers, as well as accuracy in multiple-choice tasks. The test includes generation tasks and multiple-choice tasks, using fine-tuned GPT-3 models and traditional similarity metrics for evaluation.
提供机构:
studymakesmehappyyyyy



