nlpai-lab/ko_commongen_v2_code_switching

Name: nlpai-lab/ko_commongen_v2_code_switching
Creator: nlpai-lab
Published: 2024-08-11 13:09:39
License: 暂无描述

Hugging Face2024-08-11 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/nlpai-lab/ko_commongen_v2_code_switching

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: concept_set dtype: string - name: '1' dtype: string - name: '2' dtype: string - name: '3' dtype: string - name: '4' dtype: string - name: gold dtype: int64 splits: - name: china num_bytes: 25567 num_examples: 99 - name: english num_bytes: 33124 num_examples: 99 - name: espanol num_bytes: 35535 num_examples: 99 - name: japan num_bytes: 33329 num_examples: 99 - name: korean num_bytes: 33419 num_examples: 99 download_size: 131071 dataset_size: 160974 configs: - config_name: default data_files: - split: china path: data/china-* - split: english path: data/english-* - split: espanol path: data/espanol-* - split: japan path: data/japan-* - split: korean path: data/korean-* --- ### 🇰🇷🇺🇸🇯🇵🇨🇳🇪🇸 KoCommonGEN v2 Code-switching This KoCommonGEN v2 Code-switching dataset consists of 99 samples for numerical commonsense reasoning, which were created relying on machine translation. The dataset can be found on Hugging Face at: [nlpai-lab/ko_commongen_v2_code_switching](https://huggingface.co/datasets/nlpai-lab/ko_commongen_v2_code_switching) This dataset contains code-switching data for the following languages: - Korean (korean) - English (english) - Japanese (japan) - Chinese (china) - Spanish (espanol) (The code-switching data relies on machine translation, which may result in some inaccuracies.) To load the dataset, you can use the following code: ```python from datasets import load_dataset dataset = load_dataset("nlpai-lab/ko_commongen_v2_code_switching") # To access a specific language dataset: korean_data = dataset['korean'] english_data = dataset['english'] # ... and so on for other languages ```

提供机构：

nlpai-lab

5,000+

优质数据集

54 个

任务类型

进入经典数据集