five

FACTS-grounding-public

收藏
魔搭社区2025-11-27 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/google/FACTS-grounding-public
下载链接
链接失效反馈
官方服务:
资源简介:
# FACTS Grounding 1.0 Public Examples #### 860 public FACTS Grounding examples from Google DeepMind and Google Research FACTS Grounding is a benchmark from Google DeepMind and Google Research designed to measure the performance of AI Models on factuality and grounding. ▶ [FACTS Grounding Leaderboard on Kaggle](https://www.kaggle.com/facts-leaderboard)\ ▶ [Technical Report](https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf)\ ▶ [Evaluation Starter Code](https://www.kaggle.com/code/andrewmingwang/facts-grounding-benchmark-starter-code)\ ▶ [Google DeepMind Blog Post](https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models) ## Usage The FACTS Grounding benchmark evaluates the ability of Large Language Models (LLMs) to generate factually accurate responses grounded in provided long-form documents, encompassing a variety of domains. FACTS Grounding moves beyond simple factual question-answering by assessing whether LLM responses are fully grounded to the provided context and correctly synthesize information from a long context document. By providing a standardized evaluation framework, FACTS Grounding aims to promote the development of LLMs that are both knowledgeable and trustworthy, facilitating their responsible deployment in real-world applications. ## Dataset Description This dataset is a collection 860 examples (public set) crafted by humans for evaluating how well an AI system grounds their answers to a given context. Each example is composed of a few parts: * A system prompt (`system_instruction`) which provides general instructions to the model, including to only answer the question provided based on the information in the given context * A task (`user_request`) which includes the specific question(s) for the system to answer e.g. "*What are some tips on saving money?*" * A long document (`context_document`) which includes information necessary to answer to question e.g. an SEC filing for a publicly traded US company This dataset also contains evaluation prompts (`evaluation_prompts.csv`) for judging model generated responses to the examples. See the [Technical Report](https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf) for methodology details. ## Limitations While this benchmark represents a step forward in evaluating factual accuracy, more work remains to be done. First, this benchmark relies on potentially noisy automated LLM judge models for evaluation. By ensembling a range of frontier LLMs and averaging judge outputs, we attempt to mitigate this. Second, the FACTS benchmark focuses only on evaluating grounded responses to long-form text input and could potentially be extended. Questions, comments, or issues? Share your thoughts with us in the [discussion forum](https://www.kaggle.com/facts-leaderboard/discussion). ## Citation If you use this dataset in your research, please cite our technical report: ``` @misc{kaggle-FACTS-leaderboard, author = {Alon Jacovi, Andrew Wang, Chris Alberti, Connie Tao, Jon Lipovetz, Kate Olszewska, Lukas Haas, Michelle Liu, Nate Keating, Adam Bloniarz, Carl Saroufim, Corey Fry, Dror Marcus, Doron Kukliansky, Gaurav Singh Tomar, James Swirhun, Jinwei Xing, Lily Wang, Michael Aaron, Moran Ambar, Rachana Fellinger, Rui Wang, Ryan Sims, Zizhao Zhang, Sasha Goldshtein, Yossi Matias, and Dipanjan Das}, title = {FACTS Leaderboard}, year = {2024}, howpublished = {\url{https://kaggle.com/facts-leaderboard}}, note = {Google DeepMind, Google Research, Google Cloud, Kaggle} } ```

# FACTS Grounding 1.0 公开示例集 #### 本数据集包含由Google DeepMind与Google Research发布的860个FACTS Grounding公开示例 FACTS Grounding是由Google DeepMind与Google Research推出的基准测试集,旨在评估人工智能模型在事实性与溯源(grounding)任务上的表现。 ▶ [Kaggle平台FACTS Grounding排行榜](https://www.kaggle.com/facts-leaderboard) ▶ [技术报告](https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf) ▶ [评估入门代码](https://www.kaggle.com/code/andrewmingwang/facts-grounding-benchmark-starter-code) ▶ [Google DeepMind官方博客文章](https://deepmind.google/discover/blog/facts-grounding-a-new-benchmark-for-evaluating-the-factuality-of-large-language-models) ## 使用说明 FACTS Grounding基准测试集用于评估大语言模型(Large Language Model,LLM)基于给定长文本文档生成符合事实的准确回复的能力,覆盖多领域场景。相较于传统的简单事实问答任务,FACTS Grounding会进一步评估大语言模型的回复是否完全基于给定上下文,且能从长上下文文档中正确整合信息。通过提供标准化的评估框架,FACTS Grounding旨在推动兼具知识性与可信性的大语言模型发展,助力其在真实应用场景中的负责任部署。 ## 数据集说明 本数据集包含860个由人工构建的公开示例,用于评估人工智能系统基于给定上下文生成答案的溯源能力。每个示例由以下几部分组成: * 系统提示词(`system_instruction`):向模型提供通用指令,要求仅基于给定上下文信息回答指定问题 * 任务(`user_request`):包含需要模型回答的具体问题,例如"*有哪些省钱技巧?*" * 长文档(`context_document`):包含回答问题所需的全部信息,例如美国上市企业的美国证券交易委员会(SEC)备案文件 本数据集还包含用于评估模型生成回复的评估提示词文件(`evaluation_prompts.csv`)。评估方法的详细信息请参阅[技术报告](https://storage.googleapis.com/deepmind-media/FACTS/FACTS_grounding_paper.pdf)。 ## 局限性 尽管该基准测试集在事实准确性评估方面取得了阶段性进展,但仍有诸多改进空间。其一,本基准测试集的评估依赖可能存在噪声的自动化大语言模型评判模型,为此我们通过集成多种前沿大语言模型并对评判结果取平均的方式,试图缓解这一问题。其二,FACTS基准测试集仅针对长文本输入的溯源回复开展评估,仍具备拓展空间。 如有疑问、建议或问题,请前往[讨论论坛](https://www.kaggle.com/facts-leaderboard/discussion)与我们交流。 ## 引用格式 若您在研究中使用本数据集,请引用以下技术报告: @misc{kaggle-FACTS-leaderboard, author = {Alon Jacovi, Andrew Wang, Chris Alberti, Connie Tao, Jon Lipovetz, Kate Olszewska, Lukas Haas, Michelle Liu, Nate Keating, Adam Bloniarz, Carl Saroufim, Corey Fry, Dror Marcus, Doron Kukliansky, Gaurav Singh Tomar, James Swirhun, Jinwei Xing, Lily Wang, Michael Aaron, Moran Ambar, Rachana Fellinger, Rui Wang, Ryan Sims, Zizhao Zhang, Sasha Goldshtein, Yossi Matias, and Dipanjan Das}, title = {FACTS Leaderboard}, year = {2024}, howpublished = {url{https://kaggle.com/facts-leaderboard}}, note = {Google DeepMind, Google Research, Google Cloud, Kaggle} }
提供机构:
maas
创建时间:
2025-04-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作