neody/nwc2010-cleaned

Name: neody/nwc2010-cleaned
Creator: neody
Published: 2024-07-10 08:12:25
License: 暂无描述

Hugging Face2024-07-10 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/neody/nwc2010-cleaned

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含日语文本数据，分为训练集。每个样本包含文本和特征，特征数据类型为float64。训练集大小为99730442个样本，总字节数为55519527436。数据集的下载大小为29932980650字节。

This dataset contains Japanese text data, divided into a training set. Each sample includes text and features, with feature data type being float64. The training set consists of 99730442 samples, with a total byte size of 55519527436. The download size of the dataset is 29932980650 bytes.

提供机构：

neody

原始信息汇总

数据集概述

语言

日语 (ja)

数据集信息

特征

text: 类型为字符串 (string)
features: 类型为浮点数 (float64)

数据分割

train:
- 字节数: 55,519,527,436
- 样本数: 99,730,442

数据大小

下载大小: 29,932,980,650 字节
数据集大小: 55,519,527,436 字节

配置

config_name: default
- data_files:
  - split: train
  - path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集