five

osunlp/AttributionBench

收藏
Hugging Face2024-02-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/osunlp/AttributionBench
下载链接
链接失效反馈
官方服务:
资源简介:
AttributionBench数据集是从多个现有数据源构建的,旨在为评估高级归属评估系统提供一个统一且多样化的测试平台。数据集包含域内训练集、域内测试集和域外测试集。数据集的多个配置版本包括subset_balanced、overall_balanced、not_balanced和full_data,每个版本有不同的标签分布和平衡方式。数据集字段包括问题、声明、原始声明字符串、响应、参考文献、引用链接、网页参考、归属标签、源数据集和唯一ID。
提供机构:
osunlp
原始信息汇总

数据集概述

数据集版本

子集平衡版本

  • 描述: 每个子数据集包含相同数量的可归因标签和不可归因标签。
  • 文件路径:
    • 训练集: train_all_subset_balanced.jsonl
    • 开发集: dev_all_subset_balanced.jsonl
    • 测试集: test_all_subset_balanced.jsonl
    • 测试集(域外): test_ood_all_subset_balanced.jsonl

总体平衡版本

  • 描述: 整个数据集包含相同数量的可归因标签和不可归因标签,但每个子数据集不包含。
  • 文件路径:
    • 训练集: train_overall_balanced.jsonl
    • 开发集: dev_all_subset_balanced.jsonl
    • 测试集: test_all_subset_balanced.jsonl
    • 测试集(域外): test_ood_all_subset_balanced.jsonl

非平衡版本

  • 描述: 标签分布与完整数据相同,不平衡,但数据规模与两个标签平衡版本相当。
  • 文件路径:
    • 训练集: merged_train_sampled.jsonl
    • 开发集: dev_all_subset_balanced.jsonl
    • 测试集: test_all_subset_balanced.jsonl
    • 测试集(域外): test_ood_all_subset_balanced.jsonl

完整数据版本

  • 描述: 完整训练数据,标签分布不平衡。
  • 文件路径:
    • 训练集: merged_train.jsonl
    • 开发集: dev_all_subset_balanced.jsonl
    • 测试集: test_all_subset_balanced.jsonl
    • 测试集(域外): test_ood_all_subset_balanced.jsonl

数据集结构

数据实例

json { "question": "Is the number of horses living on Easter Island twice the number of people?", "claim": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "claim_raw_string": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "response": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "references": [ "It is worth mentioning the huge population of horses (about 6,000) that already outnumber people and roam free on the island." ], "citation_links": [], "webpage_references": [], "attribution_label": "not attributable", "src_dataset": "AttrScore-GenSearch", "id": "AttrScore-GenSearch_7234d6e9-1f51-4203-9587-f539e34d34f4" }

数据字段

  • question: str 用户提出的问题。
  • claim: str 对问题的回应部分。可能是一个句子或多句话。
  • claim_raw_string: str 处理前的原始声明字符串。
  • response: str 由语言模型或生成搜索引擎生成的问题回应。
  • references: List[str] 支持声明的文档或段落列表。
  • citation_links: Optional[List[str]] 引文链接的保留字段。
  • webpage_references: Optional[List[str]] 参考链接网页内容的保留字段。
  • attribution_label: str "attributable" 或 "not attributable"。
  • src_dataset: str 数据项的源数据集。
  • id: str AttributionBench中数据项的唯一ID。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作