transgene/multimodal_train_cite_tokenized

Name: transgene/multimodal_train_cite_tokenized
Creator: transgene
Published: 2024-06-08 16:09:25
License: 暂无描述

Hugging Face2024-06-08 更新2024-06-29 收录

下载链接：

https://hf-mirror.com/datasets/transgene/multimodal_train_cite_tokenized

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: input_ids sequence: int32 - name: token_type_ids sequence: int8 - name: attention_mask sequence: int8 - name: labels sequence: float64 splits: - name: train num_bytes: 952942912 num_examples: 70988 download_size: 340706164 dataset_size: 952942912 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "multimodal_train_cite_tokenized" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

The multimodal_train_cite_tokenized dataset includes multiple features such as input_ids, token_type_ids, attention_mask, and labels, each corresponding to different types of data sequences. The dataset is divided into a training set (train) with 70988 samples. The download size of the dataset is 340706164 bytes, and the actual size is 952942912 bytes. The dataset configuration is default, with data file paths as data/train-*.

提供机构：

transgene

原始信息汇总

数据集概述

数据集名称

multimodal_train_cite_tokenized

数据集特征

input_ids: 序列类型为int32
token_type_ids: 序列类型为int8
attention_mask: 序列类型为int8
labels: 序列类型为float64

数据集分割

train: 包含70988个样本，占用952942912字节

数据集大小

下载大小: 340706164字节
数据集大小: 952942912字节

配置

config_name: default
- data_files:
  - split: train
  - path: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集