mrm8488/spanish_legal_ds_tokenized_and_gropuped

Name: mrm8488/spanish_legal_ds_tokenized_and_gropuped
Creator: mrm8488
Published: 2023-03-09 22:41:00
License: 暂无描述

Hugging Face2023-03-09 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/mrm8488/spanish_legal_ds_tokenized_and_gropuped

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: input_ids sequence: int32 splits: - name: train num_bytes: 7117386800 num_examples: 1735948 - name: test num_bytes: 703888000 num_examples: 171680 download_size: 3629670012 dataset_size: 7821274800 --- # Dataset Card for "spanish_legal_ds_tokenized_and_gropuped" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息：特征项： - 名称：输入标识（input_ids），数据格式为int32类型的序列数据集划分： - 划分名称：训练集（train），字节数：7117386800，样本数量：1735948 - 划分名称：测试集（test），字节数：703888000，样本数量：171680 下载大小：3629670012 数据集总大小：7821274800 --- # 「spanish_legal_ds_tokenized_and_gropuped」数据集卡片【需补充更多信息】(https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

mrm8488

原始信息汇总

数据集概述

数据集名称

名称: spanish_legal_ds_tokenized_and_gropuped

数据集特征

特征名称: input_ids
数据类型: 整数序列（int32）

数据集分割

训练集
- 示例数量: 1735948
- 数据大小: 7117386800 字节
测试集
- 示例数量: 171680
- 数据大小: 703888000 字节

数据集大小

下载大小: 3629670012 字节
总数据集大小: 7821274800 字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集