Ruth-Ann/jampatoisnli
收藏Hugging Face2022-12-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Ruth-Ann/jampatoisnli
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
language:
- jam
language_creators:
- expert-generated
- found
license:
- other
multilinguality:
- monolingual
- other-english-based-creole
pretty_name: JamPatoisNLI
size_categories:
- n<1K
source_datasets:
- original
tags:
- creole
- low-resource-language
task_categories:
- text-classification
task_ids:
- natural-language-inference
---
# Dataset Card for [Dataset Name]
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:**
- jampatoisnli.github.io
- **Repository:**
- https://github.com/ruth-ann/jampatoisnli
- **Paper:**
- https://arxiv.org/abs/2212.03419
- **Point of Contact:**
- Ruth-Ann Armsrong: armstrongruthanna@gmail.com
### Dataset Summary
JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois.
Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from
a major world language and a distinctive grammar reflecting the languages of the original speakers and the process
of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer
from large monolingual or multilingual pretrained models.
### Supported Tasks and Leaderboards
Natural language inference
### Languages
Jamaican Patois
### Data Fields
premise, hypothesis, label
### Data Splits
Train: 250
Val: 200
Test: 200
### Data set creation + Annotations
Premise collection:
97% of examples from Twitter; remaining pulled from literature and online cultural website
Hypothesis construction:
For each premise, hypothesis written by native speaker (our first author) so that pair’s classification would be E, N or C
Label validation:
Random sample of 100 sentence pairs double annotated by fluent speakers
### Social Impact of Dataset
JamPatoisNLI is a low-resource language dataset in an English-based Creole spoken in the Caribbean,
Jamaican Patois. The creation of the dataset contributes to expanding the scope of NLP research
to under-explored languages across the world.
### Dataset Curators
[@ruth-ann](https://github.com/ruth-ann)
### Citation Information
@misc{https://doi.org/10.48550/arxiv.2212.03419,
doi = {10.48550/ARXIV.2212.03419},
url = {https://arxiv.org/abs/2212.03419},
author = {Armstrong, Ruth-Ann and Hewitt, John and Manning, Christopher},
keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7},
title = {JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}
### Contributions
Thanks to Prof. Christopher Manning and John Hewitt for their contributions, guidance, facilitation and support related to the creation of this dataset.
提供机构:
Ruth-Ann
原始信息汇总
数据集概述
数据集基本信息
- 名称: JamPatoisNLI
- 语言:
- 主要语言: Jamaican Patois
- 其他: English-based Creole
- 许可证: other
- 多语言性:
- 单语种
- 其他英语基础克里奥尔语
- 大小: 小于1K
- 来源: 原始数据
- 标签:
- 克里奥尔语
- 低资源语言
- 任务类别: 文本分类
- 任务ID: 自然语言推理
数据集描述
数据集摘要
JamPatoisNLI 是首个针对克里奥尔语Jamaican Patois的自然语言推理数据集。该数据集探索了大型单语或多语预训练模型在克里奥尔语中的迁移效果。
支持的任务和排行榜
- 自然语言推理
语言
- Jamaican Patois
数据结构
数据实例
- 训练集: 250
- 验证集: 200
- 测试集: 200
数据字段
- 前提 (premise)
- 假设 (hypothesis)
- 标签 (label)
数据分割
- 训练集
- 验证集
- 测试集
数据集创建
来源数据
- 前提收集: 97%来自Twitter,其余来自文献和在线文化网站
- 假设构建: 每个前提由母语者编写,以确保分类为E, N或C
- 标签验证: 随机抽样100对句子由流利说话者双重标注
社会影响
JamPatoisNLI 是一个针对加勒比地区英语基础克里奥尔语Jamaican Patois的低资源语言数据集,有助于扩大NLP研究的范围,覆盖全球未充分探索的语言。
数据集管理者
引用信息
@misc{https://doi.org/10.48550/arxiv.2212.03419, doi = {10.48550/ARXIV.2212.03419}, url = {https://arxiv.org/abs/2212.03419}, author = {Armstrong, Ruth-Ann and Hewitt, John and Manning, Christopher}, keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7}, title = {JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} }



