FACTORY

Name: FACTORY
Creator: maas
Published: 2025-12-05 16:44:07
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/facebook/FACTORY

下载链接

链接失效反馈

官方服务：

资源简介：

# Overview FACTORY is a large-scale, human-verified, and challenging prompt set. We employ a model-in-the-loop approach to ensure quality and address the complexities of evaluating long-form generation. Starting with seed topics from Wikipedia, we expand each topic into a diverse set of prompts using large language models (LLMs). We then apply the model-in-the-loop method to filter out simpler prompts, maintaining a high level of difficulty. Human annotators further refine the prompts to ensure they are fact-seeking, answerable, unambiguous, not time-sensitive, and safe. To push the boundaries of long-form factuality evaluation, we identify a “hard” split of FACTORY that presents significant challenges to current state-of-the-art LLMs, with their outputs containing approximately 40% of claims for which humans cannot find supportive information online. This dataset is stored in the JSON Lines (.jsonl) format, where each line contains a single JSON object representing one data entry. # Structure Each line in the dataset file has the following keys: - question: (string) — A natural language question requiring a long-form answer. - url: (string) — One or more URLs to resources that provide relevant information for answering the question. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a79b27909a525bcbc0708f/3IQGboL5_1zhqJhDfRzIB.png) Figure 1. Factual precision as evaluated by human annotators on 100 sentences per model for each benchmark. All the models are retrieval-augmented. **We have also released the human annotations collected during the evaluation of factual precision, available [here](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)** # Structure for the Human Annotations Each line in the file is a valid JSON object containing the following keys for each annotated claim: - Claim 1, Claim 2, ..., Claim N: The text of each claim. - Claim 1 Tag, Claim 2 Tag, ..., Claim N Tag: Factuality label for the corresponding claim. The label indicates the annotator's assessment of the claim's factuality and can be one of the following: + "Factual" + "NonFactual" + "Inconclusive" + "No Verifiable Fact" - Source Claim 1, Source Claim 2, ..., Source Claim N: A string of URLs containing sources or evidence that support/refute the claim. This field may be empty if the claim's tag is "Inconclusive". - Claim 1 Snippet, Claim 2 Snippet, ..., Claim N Snippet: Text snippets copied from the sources above, providing direct evidences for the associated claim and its assigned factuality label. See our [technical report](https://arxiv.org/abs/2508.00109) for more details # Reference ``` @article{chen2025factory, title={FACTORY: A Challenging Human-Verified Prompt Set for Long-Form Factuality}, author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott}, journal={arXiv preprint arXiv:2508.00109}, year={2025} } ```

# 概述 FACTORY是一个大规模、经人工验证且极具挑战性的提示集（prompt set）。我们采用模型在环（model-in-the-loop）方法保障数据集质量，并解决长文本生成评估中的复杂性问题。我们以维基百科提供的种子主题为初始素材，借助大语言模型（Large Language Models，LLMs）将每个主题拓展为多样化的提示集合。随后我们再次运用模型在环方法筛选出难度较低的提示，确保整体数据集保持较高的挑战性。人工标注人员会对提示进行进一步打磨，确保所有提示均为事实导向、可作答、表述清晰、无时间敏感性且符合安全规范。为了突破长文本事实性评估的边界，我们从FACTORY数据集中划分出「困难子集（hard split）」，该子集对当前前沿大语言模型构成了显著挑战：模型生成的输出中，约有40%的主张无法通过在线信息找到佐证依据。本数据集采用JSON Lines（.jsonl）格式存储，每一行均为一个独立的JSON对象，对应一条数据条目。 # 数据结构数据集文件的每一行包含以下字段： - question（字符串）：需要长文本作答的自然语言问题。 - url（字符串）：一个或多个可提供该问题作答所需相关信息的资源链接。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a79b27909a525bcbc0708f/3IQGboL5_1zhqJhDfRzIB.png) 图1 各基准模型在每条基准上针对100个句子进行评估得到的事实精确率。所有模型均为检索增强型模型。 **我们还发布了在事实精确率评估过程中收集的人工标注数据，可通过[此链接](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)获取** # 人工标注数据结构人工标注数据文件的每一行均为合法JSON对象，每条标注的主张包含以下字段： - Claim 1、Claim 2……Claim N：各条主张的文本内容。 - Claim 1 Tag、Claim 2 Tag……Claim N Tag：对应主张的事实性标签。该标签反映标注人员对该主张事实性的评估结果，可选取值如下： + "Factual"（事实正确） + "NonFactual"（事实错误） + "Inconclusive"（无法定论） + "No Verifiable Fact"（无可验证事实） - Source Claim 1、Source Claim 2……Source Claim N：包含支持或反驳该主张的来源或证据的URL字符串。若该主张的标签为"Inconclusive"，则此字段可为空。 - Claim 1 Snippet、Claim 2 Snippet……Claim N Snippet：从上述来源中摘录的文本片段，为对应主张及其标注的事实性标签提供直接佐证。更多细节可参阅我们的[技术报告](https://arxiv.org/abs/2508.00109) # 参考文献 @article{chen2025factory, title={FACTORY：面向长文本事实性评估的高挑战性人工验证提示集}, author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott}, journal={arXiv preprint arXiv:2508.00109}, year={2025} }

提供机构：

maas

创建时间：

2025-08-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集