SEACrowd/nusaparagraph_emot
收藏数据集概述
语言
- btk, bew, bug, jav, mad, mak, min, mui, rej, sun
支持任务
- 情感分类
数据集使用
使用 datasets 库
python from datasets import load_dataset dset = datasets.load_dataset("SEACrowd/nusaparagraph_emot", trust_remote_code=True)
使用 seacrowd 库
python import seacrowd as sc
使用默认配置加载数据集
dset = sc.load_dataset("nusaparagraph_emot", schema="seacrowd")
检查数据集的所有可用子集(配置名称)
print(sc.available_config_names("nusaparagraph_emot"))
使用特定配置加载数据集
dset = sc.load_dataset_by_config_name(config_name="<config_name>")
数据集主页
数据集版本
- 源版本: 1.0.0
- SEACrowd版本: 2024.06.20
数据集许可
- Creative Commons Attribution Share-Alike 4.0 International
引用
plaintext
@unpublished{anonymous2023nusawrites:,
title={NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages},
author={Anonymous},
journal={OpenReview Preprint},
year={2023},
note={anonymous preprint under review}
}
@article{lovenia2024seacrowd, title={SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages}, author={Holy Lovenia and others}, year={2024}, eprint={2406.10118}, journal={arXiv preprint arXiv: 2406.10118} }



