FActScore|大语言模型数据集|事实准确性数据集

github2023-05-01 更新2025-02-07 收录

大语言模型

事实准确性

下载链接：

https://github.com/shmsw25/FActScore

下载链接

链接失效反馈

资源简介：

通过FActScore数据集评估大语言模型（LLMs）在生成广泛内容时的事实准确性。该数据集包含500个英文评估样本，内容取自维基百科中的传记信息。采用了一种新颖的方法，将生成的文本拆分为基本事实，并计算知识来源认可的事实成分的得分。

The FActScore dataset is employed to evaluate the factual accuracy of large language models (LLMs) in generating diverse content. This dataset comprises 500 English evaluation samples, sourced from biographical information on Wikipedia. An innovative approach is adopted, where the generated text is decomposed into fundamental facts, and the scores of fact components recognized by knowledge sources are calculated.

提供机构：

University of Washington et al.

创建时间：

2023-05-01

原始信息汇总

FActScore 数据集概述

基本信息

论文标题: FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
会议: EMNLP 2023
论文地址: https://arxiv.org/abs/2305.14251
代码库: https://github.com/shmsw25/FActScore
PIP包: factscore

数据集内容

标注数据: 包含论文第3节和第4.2节中报告的事实精确度的人工标注数据。
- 下载地址: Google Drive
未标注数据: 包含论文第4.3节中12种不同语言模型的FActScore结果。
- 下载地址: Google Drive

数据格式

标注数据: 未明确说明格式，但包含人工标注的事实精确度。
未标注数据: 每行为一个字典，包含以下字段：
- prompt: 输入模型的初始提示
- facts: 模型分解的原子事实
- LLAMA+NP_labels: 由LLAMA+NP验证的事实标签
- ChatGPT_labels: 由ChatGPT验证的事实标签

使用方法

安装: bash pip install --upgrade factscore python -m spacy download en_core_web_sm
下载数据: bash python -m factscore.download_data --llama_7B_HF_path "llama-7B"
运行FActScore: bash python -m factscore.factscorer --input_path {input_path} --model_name {estimator_name} --openai_key {openai_key}

评估指标

FActScore: 事实精确度评分
respond_ratio: 响应比例（非拒绝回答的比例）
num_facts_per_response: 每个响应的平均原子事实数

支持的语言模型

推荐模型:
- retrieval+ChatGPT
- retrieval+llama+npm

自定义知识源

格式: .jsonl文件，每行包含title和text字段。
注册知识源: python fs.register_knowledge_source(name_of_your_knowledge_source, data_path=path_to_jsonl_file, db_path=path_to_output_db_file)

引用

bibtex @inproceedings{ factscore, title={ {FActScore}: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation }, author={ Min, Sewon and Krishna, Kalpesh and Lyu, Xinxi and Lewis, Mike and Yih, Wen-tau and Koh, Pang Wei and Iyyer, Mohit and Zettlemoyer, Luke and Hajishirzi, Hannaneh }, year={ 2023 }, booktitle = { EMNLP }, url={ https://arxiv.org/abs/2305.14251 } }

AI搜集汇总

数据集介绍

构建方式

FActScore数据集的构建基于对长文本生成中事实精确度的细粒度评估。研究者通过人工标注的方式，对生成文本中的事实进行原子级别的分解与验证，确保每个事实单元都能独立评估其准确性。数据集的核心知识来源为2023年4月的维基百科数据，同时支持用户自定义知识源。数据集的构建过程包括对生成文本的原子事实分解、基于知识源的验证以及人工标注的交叉验证，确保了数据的高质量与可靠性。

特点

FActScore数据集的特点在于其细粒度的事实评估能力。它不仅提供了对生成文本的整体事实精确度评分，还支持对每个原子事实的独立验证。数据集涵盖了多种语言模型的生成结果，并提供了基于ChatGPT和LLAMA+NP的两种验证标签，便于用户进行对比分析。此外，数据集还支持自定义知识源，使其能够灵活适应不同领域的评估需求。通过提供详细的响应比例、原子事实数量等统计信息，FActScore为研究者提供了全面的评估工具。

使用方法

使用FActScore数据集时，用户可以通过命令行或Python API调用其评估功能。首先，用户需安装FActScore的PIP包，并下载所需的知识源数据。随后，用户可以通过指定输入路径、模型名称和OpenAI API密钥来运行评估。数据集支持对生成文本的原子事实分解与验证，并提供详细的评分结果，包括事实精确度、响应比例和原子事实数量等。用户还可以通过自定义知识源来扩展数据集的适用范围，或使用预标注的数据进行快速评估与验证。

背景与挑战

背景概述

FActScore数据集由Sewon Min等研究人员于2023年发布，旨在解决长文本生成中的事实精确性评估问题。该数据集的核心研究问题是通过细粒度的原子事实评估，量化生成文本的事实准确性。FActScore的提出为自然语言处理领域中的文本生成模型提供了一个新的评估标准，特别是在生成内容的可信度和准确性方面具有重要意义。该数据集的研究成果已在EMNLP 2023会议上发表，并得到了广泛关注。

当前挑战

FActScore数据集在构建和应用过程中面临多重挑战。首先，长文本生成中的事实精确性评估本身具有复杂性，如何将生成内容分解为原子事实并进行准确标注是一个技术难题。其次，数据集的构建依赖于大规模的知识源（如Wikipedia），如何高效地从这些知识源中提取相关信息并确保其时效性是一个关键挑战。此外，评估过程中涉及多个模型（如ChatGPT和LLAMA）的协同工作，如何确保不同模型之间的评估结果一致性也是一个重要问题。最后，数据集的扩展性和通用性仍需进一步优化，以适应更多样化的文本生成任务和领域。

常用场景

经典使用场景

FActScore数据集在自然语言处理领域中被广泛用于评估长文本生成模型的事实准确性。通过细粒度的原子事实分解与验证，该数据集能够精确衡量生成文本中每个事实单元的准确性，从而为模型优化提供可靠的数据支持。其经典使用场景包括对生成式模型（如GPT-4、ChatGPT等）在传记生成任务中的表现进行评估，帮助研究者识别模型在事实性上的不足。

实际应用

在实际应用中，FActScore数据集被广泛用于评估和优化生成式模型在知识密集型任务中的表现。例如，在自动生成新闻摘要、技术文档或教育内容时，FActScore能够帮助开发者识别并修正模型生成文本中的事实错误，从而提高内容的可信度与实用性。此外，该数据集还可用于构建更智能的问答系统，确保系统生成的回答基于准确的事实依据。

衍生相关工作

FActScore数据集的发布催生了一系列相关研究工作，特别是在生成式模型的事实性评估与改进领域。例如，基于FActScore的评估框架，研究者开发了多种改进模型事实准确性的方法，如基于检索增强的生成模型（Retrieval-Augmented Generation）和基于知识图谱的生成优化技术。此外，FActScore还被用于构建更细粒度的事实性评估基准，推动了生成式模型在开放域问答、对话系统等任务中的应用。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

Yahoo Finance

Dataset About finance related to stock market

kaggle 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

12306车次数据库

本数据库包含12306车次相关的详细信息，如车次代码、车站代码、列车基本信息和时刻表信息等。数据已按车次等级整理，并提供多种格式的数据文件，方便用户根据实际需求调用。

github 收录

全国 1∶200 000 数字地质图（公开版）空间数据库

As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.

DataCite Commons 收录

DIV2K

DIV2K数据集分为: 列车数据: 从800高清高分辨率图像开始，我们获得相应的低分辨率图像，并为2、3和4个降尺度因子提供高分辨率和低分辨率图像验证数据: 100高清晰度高分辨率图像用于生成低分辨率对应图像，低分辨率从挑战开始提供，并用于参与者从验证服务器获得在线反馈; 当挑战的最后阶段开始时，高分辨率图像将被释放。测试数据: 100多样的图像用于生成低分辨率的相应图像; 参与者将在最终评估阶段开始时收到低分辨率图像，并在挑战结束并确定获胜者后宣布结果。

OpenDataLab 收录