neody/nwc2010-cleaned
收藏Hugging Face2024-07-10 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/neody/nwc2010-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含日语文本数据,分为训练集。每个样本包含文本和特征,特征数据类型为float64。训练集大小为99730442个样本,总字节数为55519527436。数据集的下载大小为29932980650字节。
This dataset contains Japanese text data, divided into a training set. Each sample includes text and features, with feature data type being float64. The training set consists of 99730442 samples, with a total byte size of 55519527436. The download size of the dataset is 29932980650 bytes.
提供机构:
neody
原始信息汇总
数据集概述
语言
- 日语 (ja)
数据集信息
特征
- text: 类型为字符串 (string)
- features: 类型为浮点数 (float64)
数据分割
- train:
- 字节数: 55,519,527,436
- 样本数: 99,730,442
数据大小
- 下载大小: 29,932,980,650 字节
- 数据集大小: 55,519,527,436 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



