damand2061/id_cannot_12K

Name: damand2061/id_cannot_12K
Creator: damand2061
Published: 2023-10-23 15:31:38
License: 暂无描述

Hugging Face2023-10-23 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/damand2061/id_cannot_12K

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是[cannot](https://huggingface.co/datasets/tum-nlp/cannot-dataset)数据集的印度尼西亚语翻译版本，包含了12K行的数据。翻译使用了Google Translate，并进行了人工检查和修改。数据集的特征包括前提（premise）、假设（hypothesis）和标签（label），数据类型分别为字符串和整型。数据集分为训练集和验证集，训练集包含9600个样本，验证集包含2400个样本。数据集的类别为文本分类，语言为印度尼西亚语。

This is an Indonesia-translated version of the 12K top-rows of the cannot dataset. The translation was done using Google Translate and manually rechecked with necessary modifications. The dataset includes three main features: premise, hypothesis, and label, all of which are string types. It is divided into a training set and a validation set, containing 9600 and 2400 samples respectively.

提供机构：

damand2061

原始信息汇总

数据集概述

许可证

本数据集遵循 CC BY-SA 4.0 许可证。

配置

默认配置 (default) 包含以下数据文件：
- 训练集 (train)：路径为 data/train-*
- 验证集 (validation)：路径为 data/validation-*

数据集信息

特征：
- premise：类型为 string
- hypothesis：类型为 string
- label：类型为 int64
拆分：
- 训练集 (train)：
  - 字节数：1487375
  - 样本数：9600
- 验证集 (validation)：
  - 字节数：372708
  - 样本数：2400
下载大小：1214303 字节
数据集大小：1860083 字节

任务类别

文本分类

语言

印尼语

5,000+

优质数据集

54 个

任务类型

进入经典数据集