five

Ruth-Ann/jampatoisnli

收藏
Hugging Face2022-12-31 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Ruth-Ann/jampatoisnli
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language: - jam language_creators: - expert-generated - found license: - other multilinguality: - monolingual - other-english-based-creole pretty_name: JamPatoisNLI size_categories: - n<1K source_datasets: - original tags: - creole - low-resource-language task_categories: - text-classification task_ids: - natural-language-inference --- # Dataset Card for [Dataset Name] ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - jampatoisnli.github.io - **Repository:** - https://github.com/ruth-ann/jampatoisnli - **Paper:** - https://arxiv.org/abs/2212.03419 - **Point of Contact:** - Ruth-Ann Armsrong: armstrongruthanna@gmail.com ### Dataset Summary JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. ### Supported Tasks and Leaderboards Natural language inference ### Languages Jamaican Patois ### Data Fields premise, hypothesis, label ### Data Splits Train: 250 Val: 200 Test: 200 ### Data set creation + Annotations Premise collection: 97% of examples from Twitter; remaining pulled from literature and online cultural website Hypothesis construction: For each premise, hypothesis written by native speaker (our first author) so that pair’s classification would be E, N or C Label validation: Random sample of 100 sentence pairs double annotated by fluent speakers ### Social Impact of Dataset JamPatoisNLI is a low-resource language dataset in an English-based Creole spoken in the Caribbean, Jamaican Patois. The creation of the dataset contributes to expanding the scope of NLP research to under-explored languages across the world. ### Dataset Curators [@ruth-ann](https://github.com/ruth-ann) ### Citation Information @misc{https://doi.org/10.48550/arxiv.2212.03419, doi = {10.48550/ARXIV.2212.03419}, url = {https://arxiv.org/abs/2212.03419}, author = {Armstrong, Ruth-Ann and Hewitt, John and Manning, Christopher}, keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7}, title = {JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} } ### Contributions Thanks to Prof. Christopher Manning and John Hewitt for their contributions, guidance, facilitation and support related to the creation of this dataset.
提供机构:
Ruth-Ann
原始信息汇总

数据集概述

数据集基本信息

  • 名称: JamPatoisNLI
  • 语言:
    • 主要语言: Jamaican Patois
    • 其他: English-based Creole
  • 许可证: other
  • 多语言性:
    • 单语种
    • 其他英语基础克里奥尔语
  • 大小: 小于1K
  • 来源: 原始数据
  • 标签:
    • 克里奥尔语
    • 低资源语言
  • 任务类别: 文本分类
  • 任务ID: 自然语言推理

数据集描述

数据集摘要

JamPatoisNLI 是首个针对克里奥尔语Jamaican Patois的自然语言推理数据集。该数据集探索了大型单语或多语预训练模型在克里奥尔语中的迁移效果。

支持的任务和排行榜

  • 自然语言推理

语言

  • Jamaican Patois

数据结构

数据实例

  • 训练集: 250
  • 验证集: 200
  • 测试集: 200

数据字段

  • 前提 (premise)
  • 假设 (hypothesis)
  • 标签 (label)

数据分割

  • 训练集
  • 验证集
  • 测试集

数据集创建

来源数据

  • 前提收集: 97%来自Twitter,其余来自文献和在线文化网站
  • 假设构建: 每个前提由母语者编写,以确保分类为E, N或C
  • 标签验证: 随机抽样100对句子由流利说话者双重标注

社会影响

JamPatoisNLI 是一个针对加勒比地区英语基础克里奥尔语Jamaican Patois的低资源语言数据集,有助于扩大NLP研究的范围,覆盖全球未充分探索的语言。

数据集管理者

引用信息

@misc{https://doi.org/10.48550/arxiv.2212.03419, doi = {10.48550/ARXIV.2212.03419}, url = {https://arxiv.org/abs/2212.03419}, author = {Armstrong, Ruth-Ann and Hewitt, John and Manning, Christopher}, keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences, I.2.7}, title = {JamPatoisNLI: A Jamaican Patois Natural Language Inference Dataset}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作