five

ServiceNow/coco_encoded

收藏
Hugging Face2024-05-11 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ServiceNow/coco_encoded
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: filepath dtype: string - name: sentids list: int32 - name: filename dtype: string - name: imgid dtype: int32 - name: split dtype: string - name: sentences struct: - name: tokens list: string - name: raw dtype: string - name: imgid dtype: int32 - name: sentid dtype: int32 - name: cocoid dtype: int32 - name: hf_hub_laion/CLIP_ViT_g_14_laion2B_s12B_b42K_short_features sequence: float32 - name: FacebookAI/roberta_base_short_features sequence: float32 - name: FacebookAI/roberta_base_long_features sequence: float32 - name: FacebookAI/roberta_base_normalized_long_features sequence: float32 - name: facebook/dinov2_large_short_features sequence: float32 - name: facebook/dinov2_large_long_features sequence: float32 - name: facebook/dinov2_large_normalized_long_features sequence: float32 - name: hf_hub_timm/ViT_B_16_SigLIP_short_features sequence: float32 splits: - name: train num_bytes: 102385311908.625 num_examples: 566747 - name: validation num_bytes: 4517702812.75 num_examples: 25010 - name: test num_bytes: 4516243134.75 num_examples: 25010 download_size: 80421808435 dataset_size: 111419257856.125 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* ---
提供机构:
ServiceNow
原始信息汇总

数据集概述

数据特征

  • image: 图像数据
  • filepath: 字符串,文件路径
  • sentids: 整数列表
  • filename: 字符串,文件名
  • imgid: 整数,图像ID
  • split: 字符串,数据集划分(如训练、验证、测试)
  • sentences: 结构化数据,包含以下字段:
    • tokens: 字符串列表
    • raw: 字符串,原始文本
    • imgid: 整数,图像ID
    • sentid: 整数,句子ID
  • cocoid: 整数,COCO图像ID
  • hf_hub_laion/CLIP_ViT_g_14_laion2B_s12B_b42K_short_features: 浮点数序列
  • FacebookAI/roberta_base_short_features: 浮点数序列
  • FacebookAI/roberta_base_long_features: 浮点数序列
  • FacebookAI/roberta_base_normalized_long_features: 浮点数序列
  • facebook/dinov2_large_short_features: 浮点数序列
  • facebook/dinov2_large_long_features: 浮点数序列
  • facebook/dinov2_large_normalized_long_features: 浮点数序列
  • hf_hub_timm/ViT_B_16_SigLIP_short_features: 浮点数序列

数据集划分

  • train: 训练集,包含566747个样本,大小为102385311908.625字节
  • validation: 验证集,包含25010个样本,大小为4517702812.75字节
  • test: 测试集,包含25010个样本,大小为4516243134.75字节

数据集大小

  • 下载大小: 80421808435字节
  • 数据集大小: 111419257856.125字节

配置

  • config_name: default
    • data_files:
      • train: 路径为data/train-*
      • validation: 路径为data/validation-*
      • test: 路径为data/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作