资源简介:
---
license: cc-by-nc-sa-4.0
language:
- en
size_categories:
- 1K<n<10K
---
# Dataset Card for GlobalOpinionQA
## Dataset Summary
The data contains a subset of survey questions about global issues and opinions adapted from the [World Values Survey](https://www.worldvaluessurvey.org/) and [Pew Global Attitudes Survey](https://www.pewresearch.org/).
The data is further described in the paper: [Towards Measuring the Representation of Subjective Global Opinions in Language Models](https://arxiv.org/abs/2306.16388).
## Purpose
In our paper, we use this dataset to analyze the opinions that large language models (LLMs) reflect on complex global issues.
Our goal is to gain insights into potential biases in AI systems by evaluating their performance on subjective topics.
## Data Format
The data is in a CSV file with the following columns:
- question: The text of the survey question.
- selections: A dictionary where the key is the country name and the value is a list of percentages of respondents who selected each answer option for that country.
- options: A list of the answer options for the given question.
- source: GAS/WVS depending on whether the question is coming from Global Attitudes Survey or World Value Survey.
## Usage
```python
from datasets import load_dataset
# Loading the data
dataset = load_dataset("Anthropic/llm_global_opinions")
```
## Disclaimer
We recognize the limitations in using this dataset to evaluate LLMs, as they were not specifically
designed for this purpose. Therefore, we acknowledge that the construct validity of these datasets when applied to LLMs may be limited.
## Contact
For questions, you can email esin at anthropic dot com
## Citation
If you would like to cite our work or data, you may use the following bibtex citation:
```
@misc{durmus2023measuring,
title={Towards Measuring the Representation of Subjective Global Opinions in Language Models},
author={Esin Durmus and Karina Nyugen and Thomas I. Liao and Nicholas Schiefer and Amanda Askell and Anton Bakhtin and Carol Chen and Zac Hatfield-Dodds and Danny Hernandez and Nicholas Joseph and Liane Lovitt and Sam McCandlish and Orowa Sikder and Alex Tamkin and Janel Thamkul and Jared Kaplan and Jack Clark and Deep Ganguli},
year={2023},
eprint={2306.16388},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
许可证:CC BY-NC-SA 4.0(知识共享署名-非商业性使用-相同方式共享4.0国际许可协议)
语言:
- 英语(en)
样本规模区间:1000 < n < 10000
---
# GlobalOpinionQA 数据集卡片
## 数据集概述
本数据集包含改编自[世界价值观调查(World Values Survey)](https://www.worldvaluessurvey.org/)与[皮尤全球态度调查(Pew Global Attitudes Survey)](https://www.pewresearch.org/)的全球议题及主观观点类调查问卷子集。
本数据集的详细说明可参见论文:[《量化大语言模型中主观全球观点的呈现》(Towards Measuring the Representation of Subjective Global Opinions in Language Models)](https://arxiv.org/abs/2306.16388)。
## 数据集用途
在本研究论文中,我们使用该数据集分析大语言模型(Large Language Model,LLM)所反映的复杂全球议题观点。我们的目标是通过评估模型在主观议题上的表现,深入洞察AI系统中潜在的偏见。
## 数据格式
本数据集以CSV文件格式存储,包含以下字段:
- `question`:调查问卷的问题文本
- `selections`:字典结构,键为国家名称,值为对应国家中各答案选项的受访者选择占比列表
- `options`:给定问题的答案选项列表
- `source`:数据来源标识,其中GAS为全球态度调查(Global Attitudes Survey)的缩写,WVS为世界价值观调查(World Value Survey)的缩写,若问题来自前者则标注为`GAS`,来自后者则标注为`WVS`
## 使用方法
python
from datasets import load_dataset
# 加载目标数据集
dataset = load_dataset("Anthropic/llm_global_opinions")
## 免责声明
我们认识到使用本数据集评估大语言模型存在局限性,因为该数据集并非专门为此用途设计。因此,我们承认将本数据集应用于大语言模型评估时,其结构效度可能存在局限。
## 联系方式
如有疑问,请发送邮件至 esin@anthropic.com。
## 引用格式
若您希望引用本研究或本数据集,可使用以下BibTeX引用格式:
@misc{durmus2023measuring,
title={Towards Measuring the Representation of Subjective Global Opinions in Language Models},
author={Esin Durmus and Karina Nyugen and Thomas I. Liao and Nicholas Schiefer and Amanda Askell and Anton Bakhtin and Carol Chen and Zac Hatfield-Dodds and Danny Hernandez and Nicholas Joseph and Liane Lovitt and Sam McCandlish and Orowa Sikder and Alex Tamkin and Janel Thamkul and Jared Kaplan and Jack Clark and Deep Ganguli},
year={2023},
eprint={2306.16388},
archivePrefix={arXiv},
primaryClass={cs.CL}
}