transgene/multimodal_train_cite_tokenized
收藏Hugging Face2024-06-08 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/transgene/multimodal_train_cite_tokenized
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: input_ids
sequence: int32
- name: token_type_ids
sequence: int8
- name: attention_mask
sequence: int8
- name: labels
sequence: float64
splits:
- name: train
num_bytes: 952942912
num_examples: 70988
download_size: 340706164
dataset_size: 952942912
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "multimodal_train_cite_tokenized"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
The multimodal_train_cite_tokenized dataset includes multiple features such as input_ids, token_type_ids, attention_mask, and labels, each corresponding to different types of data sequences. The dataset is divided into a training set (train) with 70988 samples. The download size of the dataset is 340706164 bytes, and the actual size is 952942912 bytes. The dataset configuration is default, with data file paths as data/train-*.
提供机构:
transgene
原始信息汇总
数据集概述
数据集名称
multimodal_train_cite_tokenized
数据集特征
- input_ids: 序列类型为int32
- token_type_ids: 序列类型为int8
- attention_mask: 序列类型为int8
- labels: 序列类型为float64
数据集分割
- train: 包含70988个样本,占用952942912字节
数据集大小
- 下载大小: 340706164字节
- 数据集大小: 952942912字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



