five

transgene/multimodal_train_cite_tokenized

收藏
Hugging Face2024-06-08 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/transgene/multimodal_train_cite_tokenized
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: input_ids sequence: int32 - name: token_type_ids sequence: int8 - name: attention_mask sequence: int8 - name: labels sequence: float64 splits: - name: train num_bytes: 952942912 num_examples: 70988 download_size: 340706164 dataset_size: 952942912 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "multimodal_train_cite_tokenized" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

The multimodal_train_cite_tokenized dataset includes multiple features such as input_ids, token_type_ids, attention_mask, and labels, each corresponding to different types of data sequences. The dataset is divided into a training set (train) with 70988 samples. The download size of the dataset is 340706164 bytes, and the actual size is 952942912 bytes. The dataset configuration is default, with data file paths as data/train-*.
提供机构:
transgene
原始信息汇总

数据集概述

数据集名称

multimodal_train_cite_tokenized

数据集特征

  • input_ids: 序列类型为int32
  • token_type_ids: 序列类型为int8
  • attention_mask: 序列类型为int8
  • labels: 序列类型为float64

数据集分割

  • train: 包含70988个样本,占用952942912字节

数据集大小

  • 下载大小: 340706164字节
  • 数据集大小: 952942912字节

配置

  • config_name: default
    • data_files:
      • split: train
      • path: data/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作