sappho192/Tatoeba-Challenge-jpn-kor

Name: sappho192/Tatoeba-Challenge-jpn-kor
Creator: sappho192
Published: 2024-01-30 16:51:21
License: 暂无描述

Hugging Face2024-01-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/sappho192/Tatoeba-Challenge-jpn-kor

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-sa-4.0 task_categories: - translation language: - ja - ko size_categories: - 10M<n<100M --- # Dataset Card for Dataset Name This dataset contains Japanese-Korean paired text which is from [Helsinki-NLP/Tatoeba-Challenge](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README-v2023-09-26.md). ## Dataset Details ### Dataset Description - **Curated by:** [Helsinki-NLP](https://github.com/Helsinki-NLP) - **Language(s) (NLP):** Japanese-Korean - **License:** CC BY-NC-SA 4.0 ### Dataset Sources - **Repository:** [Helsinki-NLP/Tatoeba-Challenge](https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/data/README-v2023-09-26.md) - **Detail**: Japanese - Korean [jpn-kor](https://object.pouta.csc.fi/Tatoeba-Challenge-v2023-09-26/jpn-kor.tar) ## Uses The dataset can be used to train the translation model that translates Japanese sentence to Korean. ### Out-of-Scope Use You cannot use this dataset to train the model which is to be used under commercial service. ## Dataset Structure  Each dataset has two columns: `sourceString` and `targetString`, which corresponds to Japanese and Korean sentence. Check [example code](https://huggingface.co/datasets/sappho192/Tatoeba-Challenge-jpn-kor/blob/main/example.ipynb) to learn how to load the dataset. ## Dataset Creation ### Personal and Sensitive Information  This dataset may contain following inappropriate or explicit sentences: - personal - sensitive - private - data that reveals addresses - uniquely identifiable names or aliases - racial or ethnic origins - sexual orientations - religious beliefs - political opinions - financial or health data - etc. So use with your own risk. ## Citation **BibTeX:** ```bibtex @inproceedings{tiedemann-2020-tatoeba, title = "The {T}atoeba {T}ranslation {C}hallenge {--} {R}ealistic Data Sets for Low Resource and Multilingual {MT}", author = {Tiedemann, J{\"o}rg}, booktitle = "Proceedings of the Fifth Conference on Machine Translation", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.wmt-1.139", pages = "1174--1182" } ``` ## Dataset Card Authors [sappho192](https://huggingface.co/sappho192) ## Dataset Card Contact Please create a thread in the community.

提供机构：

sappho192

原始信息汇总

数据集卡片 for Dataset Name

数据集详情

数据集描述

由以下机构策划： Helsinki-NLP
语言(NLP)： 日语-韩语
许可证： CC BY-NC-SA 4.0

数据集来源

仓库： Helsinki-NLP/Tatoeba-Challenge
详情： 日语 - 韩语 jpn-kor

用途

该数据集可用于训练将日语句子翻译成韩语的翻译模型。

超出范围的用途

您不能使用此数据集来训练用于商业服务的模型。

数据集结构

每个数据集有两列：sourceString 和 targetString，分别对应日语和韩语句子。
查看示例代码以了解如何加载数据集。

数据集创建

个人和敏感信息

该数据集可能包含以下不当或显式句子：

个人
敏感
私人
- 揭示地址的数据
- 唯一可识别的姓名或别名
- 种族或民族起源
- 性取向
- 宗教信仰
- 政治观点
- 财务或健康数据
- 等等

因此，请自行承担风险使用。

引用

BibTeX:

bibtex @inproceedings{tiedemann-2020-tatoeba, title = "The {T}atoeba {T}ranslation {C}hallenge {--} {R}ealistic Data Sets for Low Resource and Multilingual {MT}", author = {Tiedemann, J{"o}rg}, booktitle = "Proceedings of the Fifth Conference on Machine Translation", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.wmt-1.139", pages = "1174--1182" }

5,000+

优质数据集

54 个

任务类型

进入经典数据集