pythainlp/scb_mt_2020_th2en_prompt
收藏Hugging Face2023-11-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/pythainlp/scb_mt_2020_th2en_prompt
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: inputs
dtype: string
- name: targets
dtype: string
splits:
- name: train
num_bytes: 500257169
num_examples: 801402
- name: validation
num_bytes: 61671631
num_examples: 88927
- name: test
num_bytes: 61225544
num_examples: 88931
download_size: 212800258
dataset_size: 623154344
license: cc-by-sa-4.0
task_categories:
- text2text-generation
- text-generation
language:
- th
size_categories:
- 100K<n<1M
---
# Dataset Card for "scb_mt_2020_th2en_prompt"
This dataset made from [scb_mt_enth_2020](https://huggingface.co/datasets/scb_mt_enth_2020) that removed nus_sms and paracrawl from source.
Source code for create dataset: [https://github.com/PyThaiNLP/support-aya-datasets/blob/main/translation/scb_mt.ipynb](https://github.com/PyThaiNLP/support-aya-datasets/blob/main/translation/scb_mt.ipynb)
## Template
```
Inputs: แปลประโยคหรือย่อหน้าต่อไปนี้จากภาษาไทยเป็นภาษาอังกฤษ:\n{th}
Targets: English sentence
```
提供机构:
pythainlp
原始信息汇总
数据集概述
数据集信息
-
特征:
inputs: 数据类型为stringtargets: 数据类型为string
-
分割:
train: 字节数为 500257169,样本数为 801402validation: 字节数为 61671631,样本数为 88927test: 字节数为 61225544,样本数为 88931
-
大小:
- 下载大小: 212800258 字节
- 数据集大小: 623154344 字节
-
许可: cc-by-sa-4.0
-
任务类别:
- 文本到文本生成
- 文本生成
-
语言:
- 泰语
-
大小类别:
- 100K < n < 1M



