FACTORY
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/facebook/FACTORY
下载链接
链接失效反馈官方服务:
资源简介:
# Overview
FACTORY is a large-scale, human-verified, and challenging prompt set. We employ a model-in-the-loop approach to ensure quality and address the complexities of evaluating long-form generation. Starting with seed topics from Wikipedia, we expand each topic into a diverse set of prompts using large language models (LLMs). We then apply the model-in-the-loop method to filter out simpler prompts, maintaining a high level of difficulty. Human annotators further refine the prompts to ensure they are fact-seeking, answerable, unambiguous, not time-sensitive, and safe. To push the boundaries of long-form factuality evaluation, we identify a “hard” split of FACTORY that presents significant challenges to current state-of-the-art LLMs, with their outputs containing approximately 40% of claims for which humans cannot find supportive information online.
This dataset is stored in the JSON Lines (.jsonl) format, where each line contains a single JSON object representing one data entry.
# Structure
Each line in the dataset file has the following keys:
- question: (string) — A natural language question requiring a long-form answer.
- url: (string) — One or more URLs to resources that provide relevant information for answering the question.

Figure 1. Factual precision as evaluated by human annotators on 100 sentences per model for each benchmark. All the models are retrieval-augmented.
**We have also released the human annotations collected during the evaluation of factual precision, available [here](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)**
# Structure for the Human Annotations
Each line in the file is a valid JSON object containing the following keys for each annotated claim:
- Claim 1, Claim 2, ..., Claim N:
The text of each claim.
- Claim 1 Tag, Claim 2 Tag, ..., Claim N Tag:
Factuality label for the corresponding claim. The label indicates the annotator's assessment of the claim's factuality and can be one of the following:
+ "Factual"
+ "NonFactual"
+ "Inconclusive"
+ "No Verifiable Fact"
- Source Claim 1, Source Claim 2, ..., Source Claim N:
A string of URLs containing sources or evidence that support/refute the claim. This field may be empty if the claim's tag is "Inconclusive".
- Claim 1 Snippet, Claim 2 Snippet, ..., Claim N Snippet:
Text snippets copied from the sources above, providing direct evidences for the associated claim and its assigned factuality label.
See our [technical report](https://arxiv.org/abs/2508.00109) for more details
# Reference
```
@article{chen2025factory,
title={FACTORY: A Challenging Human-Verified Prompt Set for Long-Form Factuality},
author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott},
journal={arXiv preprint arXiv:2508.00109},
year={2025}
}
```
# 概述
FACTORY是一个大规模、经人工验证且极具挑战性的提示集(prompt set)。我们采用模型在环(model-in-the-loop)方法保障数据集质量,并解决长文本生成评估中的复杂性问题。我们以维基百科提供的种子主题为初始素材,借助大语言模型(Large Language Models,LLMs)将每个主题拓展为多样化的提示集合。随后我们再次运用模型在环方法筛选出难度较低的提示,确保整体数据集保持较高的挑战性。人工标注人员会对提示进行进一步打磨,确保所有提示均为事实导向、可作答、表述清晰、无时间敏感性且符合安全规范。为了突破长文本事实性评估的边界,我们从FACTORY数据集中划分出「困难子集(hard split)」,该子集对当前前沿大语言模型构成了显著挑战:模型生成的输出中,约有40%的主张无法通过在线信息找到佐证依据。
本数据集采用JSON Lines(.jsonl)格式存储,每一行均为一个独立的JSON对象,对应一条数据条目。
# 数据结构
数据集文件的每一行包含以下字段:
- question(字符串):需要长文本作答的自然语言问题。
- url(字符串):一个或多个可提供该问题作答所需相关信息的资源链接。

图1 各基准模型在每条基准上针对100个句子进行评估得到的事实精确率。所有模型均为检索增强型模型。
**我们还发布了在事实精确率评估过程中收集的人工标注数据,可通过[此链接](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)获取**
# 人工标注数据结构
人工标注数据文件的每一行均为合法JSON对象,每条标注的主张包含以下字段:
- Claim 1、Claim 2……Claim N:各条主张的文本内容。
- Claim 1 Tag、Claim 2 Tag……Claim N Tag:对应主张的事实性标签。该标签反映标注人员对该主张事实性的评估结果,可选取值如下:
+ "Factual"(事实正确)
+ "NonFactual"(事实错误)
+ "Inconclusive"(无法定论)
+ "No Verifiable Fact"(无可验证事实)
- Source Claim 1、Source Claim 2……Source Claim N:包含支持或反驳该主张的来源或证据的URL字符串。若该主张的标签为"Inconclusive",则此字段可为空。
- Claim 1 Snippet、Claim 2 Snippet……Claim N Snippet:从上述来源中摘录的文本片段,为对应主张及其标注的事实性标签提供直接佐证。
更多细节可参阅我们的[技术报告](https://arxiv.org/abs/2508.00109)
# 参考文献
@article{chen2025factory,
title={FACTORY:面向长文本事实性评估的高挑战性人工验证提示集},
author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott},
journal={arXiv preprint arXiv:2508.00109},
year={2025}
}
提供机构:
maas
创建时间:
2025-08-01



