five

FreshQA|问答系统数据集|大型语言模型数据集

收藏
github2023-10-01 更新2025-02-07 收录
问答系统
大型语言模型
下载链接:
https://github.com/freshllms/freshqa
下载链接
链接失效反馈
资源简介:
FreshQA数据集作为一个动态的问答(QA)基准测试,包含600个英语评估样本。针对大型语言模型(LLMs)的问题根据答案的特点被分为四类:答案保持不变、答案逐渐变化、答案快速变化以及答案基于错误前提。这种评估旨在审查LLMs在回答问题时是否会产生幻觉现象,以及它们是否有能力在不被误导的情况下反驳错误的事实假设。

The FreshQA dataset serves as a dynamic question-answering (QA) benchmark, encompassing 600 English evaluation samples. The questions posed to large language models (LLMs) are categorized into four classes based on the characteristics of the answers: answers that remain unchanged, answers that gradually change, answers that rapidly change, and answers based on erroneous premises. This assessment aims to examine whether LLMs produce delusional phenomena in their responses to questions and whether they possess the ability to refute erroneous factual assumptions without being misled.
提供机构:
Google et al.
创建时间:
2023-10-01
原始信息汇总

FreshLLMs数据集概述

数据集基本信息

数据集组成

1. FreshQA

  • 最新版本: FreshQA March 24, 2025
  • 更新频率: 每周或根据请求更新
  • 反馈渠道: 通过数据集电子表格评论或发送邮件至freshllms@google.com
  • 历史版本: 包含2024年2月至2025年3月的多个版本

2. FreshPrompt

3. FreshEval

评估方法

  • 人工评估: 推荐使用人工评估检测幻觉等问题
  • 自动评估:
    • 标准指标: F1/exact match或recall
    • LLM自动评估: 如FactScore或FreshEval

致谢

  • 感谢多位贡献者对FreshQA问题和答案的更新
  • 感谢SerpApi为FreshPrompt用户提供10,000次搜索赞助

引用

bibtex @misc{vu2023freshllms, title={FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation}, author={Tu Vu and Mohit Iyyer and Xuezhi Wang and Noah Constant and Jerry Wei and Jason Wei and Chris Tar and Yun-Hsuan Sung and Denny Zhou and Quoc Le and Thang Luong}, year={2023}, eprint={2310.03214}, archivePrefix={arXiv}, primaryClass={cs.CL} }

AI搜集汇总
数据集介绍
main_image_url
构建方式
FreshQA数据集的构建依托于搜索引擎增强技术,旨在通过定期更新确保其内容的时效性和准确性。数据集通过每周更新或根据用户请求进行维护,确保问题与答案的实时性。用户可以通过数据集提供的电子表格提交反馈,帮助修正可能的错误或遗漏。这种动态更新机制使得FreshQA能够持续反映最新的知识和信息。
特点
FreshQA数据集的特点在于其高度动态性和广泛的应用场景。数据集不仅涵盖了多样化的问答对,还通过搜索引擎增强技术确保了答案的准确性和时效性。其独特的更新机制使得它能够及时反映最新的知识和信息,适用于多种大型语言模型的开发和评估。此外,数据集的设计还考虑了人类评估与自动评估的结合,提供了灵活的评估方式。
使用方法
使用FreshQA数据集时,用户可以通过访问其GitHub页面获取最新的数据集版本。数据集以电子表格形式提供,用户可以直接下载并使用。对于评估任务,用户可以选择使用`FreshEval`工具进行自动评估,该工具通过少样本学习的方式对模型响应进行评分。用户还可以根据需求调整评估模式(如`Relaxed`或`Strict`),并通过提供的Colab笔记本快速实现评估流程。
背景与挑战
背景概述
FreshQA数据集由Google等机构的研究团队于2023年创建,旨在通过搜索引擎增强大语言模型(LLMs)的实时信息检索与更新能力。该数据集的核心研究问题在于如何有效提升LLMs在动态信息环境中的准确性与时效性,特别是在处理实时更新的事实性问题时。FreshQA的发布不仅推动了LLMs在搜索引擎增强领域的发展,还直接影响了Google的Gemini、Perplexity.AI的在线LLMs等模型的开发,成为相关领域的重要基准。
当前挑战
FreshQA数据集面临的挑战主要分为两个方面。首先,在领域问题层面,如何确保LLMs在动态信息环境中保持高准确性和低幻觉率是一个关键难题,尤其是在处理实时更新的数据时,模型容易产生过时或错误的回答。其次,在数据集构建过程中,如何高效收集、验证和更新海量实时数据,并确保其质量与一致性,也是一个巨大的挑战。此外,数据集的持续更新机制需要依赖用户反馈与人工审核,这对资源与时间的要求极高。
常用场景
经典使用场景
FreshQA数据集在自然语言处理领域中被广泛用于评估和提升大型语言模型(LLMs)的事实性回答能力。通过提供实时更新的问答对,该数据集能够帮助研究人员测试模型在处理最新信息时的表现,尤其是在面对动态变化的知识时。这种场景下,FreshQA成为了验证模型是否能够准确回答基于时间敏感问题的关键工具。
实际应用
在实际应用中,FreshQA数据集被用于优化搜索引擎增强的大型语言模型,如Google的Gemini和Perplexity.AI的在线LLMs。这些模型通过结合FreshQA的实时数据,能够更好地响应用户的查询,尤其是在需要最新信息的场景中,如新闻摘要、金融分析和医疗咨询等领域。
衍生相关工作
FreshQA数据集催生了一系列相关研究工作,包括Google的Gemini、Perplexity.AI的在线LLMs、You.com的API以及Contextual AI的RAG 2.0。这些工作通过利用FreshQA的实时问答数据,进一步提升了模型在动态知识环境中的表现,推动了大型语言模型在搜索增强和事实性验证方面的创新。
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

中国气象数据

本数据集包含了中国2023年1月至11月的气象数据,包括日照时间、降雨量、温度、风速等关键数据。通过这些数据,可以深入了解气象现象对不同地区的影响,并通过可视化工具揭示中国的气温分布、降水情况、风速趋势等。

github 收录

全国 1∶200 000 数字地质图(公开版)空间数据库

As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.

DataCite Commons 收录

UniMed

UniMed是一个大规模、开源的多模态医学数据集,包含超过530万张图像-文本对,涵盖六种不同的医学成像模态:X射线、CT、MRI、超声、病理学和眼底。该数据集通过利用大型语言模型(LLMs)将特定模态的分类数据集转换为图像-文本格式,并结合现有的医学领域的图像-文本数据,以促进可扩展的视觉语言模型(VLM)预训练。

github 收录

CE-CSL

CE-CSL数据集是由哈尔滨工程大学智能科学与工程学院创建的中文连续手语数据集,旨在解决现有数据集在复杂环境下的局限性。该数据集包含5,988个从日常生活场景中收集的连续手语视频片段,涵盖超过70种不同的复杂背景,确保了数据集的代表性和泛化能力。数据集的创建过程严格遵循实际应用导向,通过收集大量真实场景下的手语视频材料,覆盖了广泛的情境变化和环境复杂性。CE-CSL数据集主要应用于连续手语识别领域,旨在提高手语识别技术在复杂环境中的准确性和效率,促进聋人与听人社区之间的无障碍沟通。

arXiv 收录