five

FACTORY

收藏
魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/facebook/FACTORY
下载链接
链接失效反馈
官方服务:
资源简介:
# Overview FACTORY is a large-scale, human-verified, and challenging prompt set. We employ a model-in-the-loop approach to ensure quality and address the complexities of evaluating long-form generation. Starting with seed topics from Wikipedia, we expand each topic into a diverse set of prompts using large language models (LLMs). We then apply the model-in-the-loop method to filter out simpler prompts, maintaining a high level of difficulty. Human annotators further refine the prompts to ensure they are fact-seeking, answerable, unambiguous, not time-sensitive, and safe. To push the boundaries of long-form factuality evaluation, we identify a “hard” split of FACTORY that presents significant challenges to current state-of-the-art LLMs, with their outputs containing approximately 40% of claims for which humans cannot find supportive information online. This dataset is stored in the JSON Lines (.jsonl) format, where each line contains a single JSON object representing one data entry. # Structure Each line in the dataset file has the following keys: - question: (string) — A natural language question requiring a long-form answer. - url: (string) — One or more URLs to resources that provide relevant information for answering the question. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a79b27909a525bcbc0708f/3IQGboL5_1zhqJhDfRzIB.png) Figure 1. Factual precision as evaluated by human annotators on 100 sentences per model for each benchmark. All the models are retrieval-augmented. **We have also released the human annotations collected during the evaluation of factual precision, available [here](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)** # Structure for the Human Annotations Each line in the file is a valid JSON object containing the following keys for each annotated claim: - Claim 1, Claim 2, ..., Claim N: The text of each claim. - Claim 1 Tag, Claim 2 Tag, ..., Claim N Tag: Factuality label for the corresponding claim. The label indicates the annotator's assessment of the claim's factuality and can be one of the following: + "Factual" + "NonFactual" + "Inconclusive" + "No Verifiable Fact" - Source Claim 1, Source Claim 2, ..., Source Claim N: A string of URLs containing sources or evidence that support/refute the claim. This field may be empty if the claim's tag is "Inconclusive". - Claim 1 Snippet, Claim 2 Snippet, ..., Claim N Snippet: Text snippets copied from the sources above, providing direct evidences for the associated claim and its assigned factuality label. See our [technical report](https://arxiv.org/abs/2508.00109) for more details # Reference ``` @article{chen2025factory, title={FACTORY: A Challenging Human-Verified Prompt Set for Long-Form Factuality}, author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott}, journal={arXiv preprint arXiv:2508.00109}, year={2025} } ```

# 概述 FACTORY是一个大规模、经人工验证且极具挑战性的提示集(prompt set)。我们采用模型在环(model-in-the-loop)方法保障数据集质量,并解决长文本生成评估中的复杂性问题。我们以维基百科提供的种子主题为初始素材,借助大语言模型(Large Language Models,LLMs)将每个主题拓展为多样化的提示集合。随后我们再次运用模型在环方法筛选出难度较低的提示,确保整体数据集保持较高的挑战性。人工标注人员会对提示进行进一步打磨,确保所有提示均为事实导向、可作答、表述清晰、无时间敏感性且符合安全规范。为了突破长文本事实性评估的边界,我们从FACTORY数据集中划分出「困难子集(hard split)」,该子集对当前前沿大语言模型构成了显著挑战:模型生成的输出中,约有40%的主张无法通过在线信息找到佐证依据。 本数据集采用JSON Lines(.jsonl)格式存储,每一行均为一个独立的JSON对象,对应一条数据条目。 # 数据结构 数据集文件的每一行包含以下字段: - question(字符串):需要长文本作答的自然语言问题。 - url(字符串):一个或多个可提供该问题作答所需相关信息的资源链接。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a79b27909a525bcbc0708f/3IQGboL5_1zhqJhDfRzIB.png) 图1 各基准模型在每条基准上针对100个句子进行评估得到的事实精确率。所有模型均为检索增强型模型。 **我们还发布了在事实精确率评估过程中收集的人工标注数据,可通过[此链接](https://huggingface.co/datasets/facebook/FACTORY/blob/main/fact_checking/human_annotations.jsonl)获取** # 人工标注数据结构 人工标注数据文件的每一行均为合法JSON对象,每条标注的主张包含以下字段: - Claim 1、Claim 2……Claim N:各条主张的文本内容。 - Claim 1 Tag、Claim 2 Tag……Claim N Tag:对应主张的事实性标签。该标签反映标注人员对该主张事实性的评估结果,可选取值如下: + "Factual"(事实正确) + "NonFactual"(事实错误) + "Inconclusive"(无法定论) + "No Verifiable Fact"(无可验证事实) - Source Claim 1、Source Claim 2……Source Claim N:包含支持或反驳该主张的来源或证据的URL字符串。若该主张的标签为"Inconclusive",则此字段可为空。 - Claim 1 Snippet、Claim 2 Snippet……Claim N Snippet:从上述来源中摘录的文本片段,为对应主张及其标注的事实性标签提供直接佐证。 更多细节可参阅我们的[技术报告](https://arxiv.org/abs/2508.00109) # 参考文献 @article{chen2025factory, title={FACTORY:面向长文本事实性评估的高挑战性人工验证提示集}, author={Chen, Mingda and Li, Yang and Chen, Xilun and Williams, Adina and Ghosh, Gargi and Yih, Scott}, journal={arXiv preprint arXiv:2508.00109}, year={2025} }
提供机构:
maas
创建时间:
2025-08-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作