osunlp/AttributionBench
收藏数据集概述
数据集版本
子集平衡版本
- 描述: 每个子数据集包含相同数量的可归因标签和不可归因标签。
- 文件路径:
- 训练集:
train_all_subset_balanced.jsonl - 开发集:
dev_all_subset_balanced.jsonl - 测试集:
test_all_subset_balanced.jsonl - 测试集(域外):
test_ood_all_subset_balanced.jsonl
- 训练集:
总体平衡版本
- 描述: 整个数据集包含相同数量的可归因标签和不可归因标签,但每个子数据集不包含。
- 文件路径:
- 训练集:
train_overall_balanced.jsonl - 开发集:
dev_all_subset_balanced.jsonl - 测试集:
test_all_subset_balanced.jsonl - 测试集(域外):
test_ood_all_subset_balanced.jsonl
- 训练集:
非平衡版本
- 描述: 标签分布与完整数据相同,不平衡,但数据规模与两个标签平衡版本相当。
- 文件路径:
- 训练集:
merged_train_sampled.jsonl - 开发集:
dev_all_subset_balanced.jsonl - 测试集:
test_all_subset_balanced.jsonl - 测试集(域外):
test_ood_all_subset_balanced.jsonl
- 训练集:
完整数据版本
- 描述: 完整训练数据,标签分布不平衡。
- 文件路径:
- 训练集:
merged_train.jsonl - 开发集:
dev_all_subset_balanced.jsonl - 测试集:
test_all_subset_balanced.jsonl - 测试集(域外):
test_ood_all_subset_balanced.jsonl
- 训练集:
数据集结构
数据实例
json { "question": "Is the number of horses living on Easter Island twice the number of people?", "claim": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "claim_raw_string": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "response": "According to James Grant-Peterkin in his book A Companion to Easter Island, there are almost 3,000 horses on Easter Island. However, locals often claim that they have more horses than people. The population of Easter Island is about 6,000 inhabitants. So it seems that the number of horses living on Easter Island is not twice the number of people.", "references": [ "It is worth mentioning the huge population of horses (about 6,000) that already outnumber people and roam free on the island." ], "citation_links": [], "webpage_references": [], "attribution_label": "not attributable", "src_dataset": "AttrScore-GenSearch", "id": "AttrScore-GenSearch_7234d6e9-1f51-4203-9587-f539e34d34f4" }
数据字段
question:str用户提出的问题。claim:str对问题的回应部分。可能是一个句子或多句话。claim_raw_string:str处理前的原始声明字符串。response:str由语言模型或生成搜索引擎生成的问题回应。references:List[str]支持声明的文档或段落列表。citation_links:Optional[List[str]]引文链接的保留字段。webpage_references:Optional[List[str]]参考链接网页内容的保留字段。attribution_label:str"attributable" 或 "not attributable"。src_dataset:str数据项的源数据集。id:strAttributionBench中数据项的唯一ID。



