shreyahavaldar/multilingual_politeness
收藏Hugging Face2024-06-03 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/shreyahavaldar/multilingual_politeness
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: Utterance
dtype: string
- name: politeness
dtype: float64
- name: language
dtype: string
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 4954742
num_examples: 18238
- name: validation
num_bytes: 619659
num_examples: 2280
- name: test
num_bytes: 619627
num_examples: 2280
download_size: 3919716
dataset_size: 6194028
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Dataset Card for "multilingual_politeness"
This dataset has the following attributes:
- Utterance: a 2-3 sentence excerpt from Wikipedia editor talk pages.
- language: English, Spanish, Chinese, or Japanese
- politeness: An annotated politeness label ranging from -2 (very rude) to 2 (very polite). Each utterance was annotated by 3 native speakers.
Please cite the following paper if you use our dataset :)
```
@inproceedings{havaldar-etal-2023-comparing,
title = "Comparing Styles across Languages",
author = "Havaldar, Shreya and Pressimone, Matthew and Wong, Eric and Ungar, Lyle",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
year = "2023",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.419"
}
```
提供机构:
shreyahavaldar
原始信息汇总
数据集概述
数据集名称
"multilingual_politeness"
数据特征
- Utterance: 字符串类型,来自维基百科编辑讨论页的2-3句话摘要。
- politeness: 浮点数类型,标注的礼貌程度,范围从-2(非常粗鲁)到2(非常礼貌),每个话语由3位母语者标注。
- language: 字符串类型,语言包括英语、西班牙语、中文或日语。
- index_level_0: 整数类型。
数据分割
- train: 18238个样本,占用4954742字节。
- validation: 2280个样本,占用619659字节。
- test: 2280个样本,占用619627字节。
数据集大小
- 下载大小: 3919716字节。
- 数据集总大小: 6194028字节。
数据文件配置
- config_name: default
- data_files:
- train: 路径为"data/train-*"
- validation: 路径为"data/validation-*"
- test: 路径为"data/test-*"



