neody/oscar-ja-cleaned
收藏Hugging Face2024-07-06 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/neody/oscar-ja-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个日语文本数据集,包含文本和特征两个主要字段。文本字段存储为字符串类型,特征字段存储为浮点数类型。数据集包含一个训练集,共有12,772,871个样本,总大小为5,783,063,135字节,下载大小为3,507,421,243字节。
This dataset is a Japanese text dataset containing two main fields: text and features. The text field is stored as a string type, and the features field is stored as a float64 type. The dataset includes a training set with 12,772,871 samples, totaling 5,783,063,135 bytes in size, and a download size of 3,507,421,243 bytes.
提供机构:
neody
原始信息汇总
数据集概述
语言
- 日语(ja)
数据集信息
特征
- text: 数据类型为字符串(string)
- features: 数据类型为浮点数(float64)
分割
- train:
- 字节数: 5783063135
- 样本数: 12772871
下载与数据大小
- 下载大小: 3507421243 字节
- 数据集大小: 5783063135 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



