Anthropic/llm_global_opinions|全球调查数据集|社会研究数据集

hugging_face2023-06-29 更新2024-03-04 收录

全球调查

社会研究

下载链接：

https://hf-mirror.com/datasets/Anthropic/llm_global_opinions

下载链接

链接失效反馈

资源简介：

--- license: cc-by-nc-sa-4.0 language: - en size_categories: - 1K<n<10K --- # Dataset Card for GlobalOpinionQA ## Dataset Summary The data contains a subset of survey questions about global issues and opinions adapted from the [World Values Survey](https://www.worldvaluessurvey.org/) and [Pew Global Attitudes Survey](https://www.pewresearch.org/). The data is further described in the paper: [Towards Measuring the Representation of Subjective Global Opinions in Language Models](https://arxiv.org/abs/2306.16388). ## Purpose In our paper, we use this dataset to analyze the opinions that large language models (LLMs) reflect on complex global issues. Our goal is to gain insights into potential biases in AI systems by evaluating their performance on subjective topics. ## Data Format The data is in a CSV file with the following columns: - question: The text of the survey question. - selections: A dictionary where the key is the country name and the value is a list of percentages of respondents who selected each answer option for that country. - options: A list of the answer options for the given question. - source: GAS/WVS depending on whether the question is coming from Global Attitudes Survey or World Value Survey. ## Usage ```python from datasets import load_dataset # Loading the data dataset = load_dataset("Anthropic/llm_global_opinions") ``` ## Disclaimer We recognize the limitations in using this dataset to evaluate LLMs, as they were not specifically designed for this purpose. Therefore, we acknowledge that the construct validity of these datasets when applied to LLMs may be limited. ## Contact For questions, you can email esin at anthropic dot com ## Citation If you would like to cite our work or data, you may use the following bibtex citation: ``` @misc{durmus2023measuring, title={Towards Measuring the Representation of Subjective Global Opinions in Language Models}, author={Esin Durmus and Karina Nyugen and Thomas I. Liao and Nicholas Schiefer and Amanda Askell and Anton Bakhtin and Carol Chen and Zac Hatfield-Dodds and Danny Hernandez and Nicholas Joseph and Liane Lovitt and Sam McCandlish and Orowa Sikder and Alex Tamkin and Janel Thamkul and Jared Kaplan and Jack Clark and Deep Ganguli}, year={2023}, eprint={2306.16388}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

Anthropic

原始信息汇总

数据集概述

名称：GlobalOpinionQA

来源：数据集包含从World Values Survey和Pew Global Attitudes Survey中选取的关于全球问题和意见的调查问题子集。

目的：用于分析大型语言模型（LLMs）在复杂全球问题上的意见反映，以洞察AI系统在主观话题上的潜在偏见。

数据格式：CSV文件，包含以下列：

question：调查问题的文本。
selections：字典，键为国家名，值为该国家选择每个答案选项的受访者百分比列表。
options：给定问题的答案选项列表。
source：GAS/WVS，表示问题来源是Global Attitudes Survey还是World Value Survey。

许可证：cc-by-nc-sa-4.0

语言：英语

大小：1K<n<10K

引用：

@misc{durmus2023measuring, title={Towards Measuring the Representation of Subjective Global Opinions in Language Models}, author={Esin Durmus and Karina Nyugen and Thomas I. Liao and Nicholas Schiefer and Amanda Askell and Anton Bakhtin and Carol Chen and Zac Hatfield-Dodds and Danny Hernandez and Nicholas Joseph and Liane Lovitt and Sam McCandlish and Orowa Sikder and Alex Tamkin and Janel Thamkul and Jared Kaplan and Jack Clark and Deep Ganguli}, year={2023}, eprint={2306.16388}, archivePrefix={arXiv}, primaryClass={cs.CL} }

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4099个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

Figshare

Figshare是一个在线数据共享平台，允许研究人员上传和共享各种类型的研究成果，包括数据集、论文、图像、视频等。它旨在促进科学研究的开放性和可重复性。

figshare.com 收录

China Health and Nutrition Survey (CHNS)

China Health and Nutrition Survey（CHNS）是一项由美国北卡罗来纳大学人口中心与中国疾病预防控制中心营养与健康所合作开展的长期开放性队列研究项目，旨在评估国家和地方政府的健康、营养与家庭计划政策对人群健康和营养状况的影响，以及社会经济转型对居民健康行为和健康结果的作用。该调查覆盖中国15个省份和直辖市的约7200户家庭、超过30000名个体，采用多阶段随机抽样方法，收集了家庭、个体以及社区层面的详细数据，包括饮食、健康、经济和社会因素等信息。自2011年起，CHNS不断扩展，新增多个城市和省份，并持续完善纵向数据链接，为研究中国社会经济变化与健康营养的动态关系提供了重要的数据支持。

www.cpc.unc.edu 收录

PRAMS

Jamie Daw, jrd2199@cumc.columbia.edu

DataCite Commons 收录

全国 1∶200 000 数字地质图（公开版）空间数据库

As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.

DataCite Commons 收录

FAOSTAT

FAOSTAT provides time-series data about agriculture, nutrition, fisheries, forestry and food aid by country and region from 1961 to present. FAOSTAT is a multilingual database. Data can be searched, browsed, and downloaded.

re3data.org 收录