LukeEuser/docvqa_50_unanswerable_questions

Name: LukeEuser/docvqa_50_unanswerable_questions
Creator: LukeEuser
Published: 2024-02-18 11:44:07
License: 暂无描述

Hugging Face2024-02-18 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LukeEuser/docvqa_50_unanswerable_questions

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: id dtype: string - name: image dtype: image - name: query struct: - name: de dtype: string - name: en dtype: string - name: es dtype: string - name: fr dtype: string - name: it dtype: string - name: answers sequence: string - name: words sequence: string - name: bounding_boxes sequence: sequence: float32 length: 4 - name: answer struct: - name: match_score dtype: float64 - name: matched_text dtype: string - name: start dtype: int64 - name: text dtype: string - name: ground_truth dtype: string splits: - name: train num_bytes: 33130077.0 num_examples: 100 - name: test num_bytes: 6102508.0 num_examples: 20 download_size: 13284640 dataset_size: 39232585.0 --- # Dataset Card for "docvqa_50_unanswerable_questions" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息：特征： - 名称：id，数据类型：字符串 - 名称：image，数据类型：图像 - 名称：query（查询文本），结构体字段如下： - de（德语）：字符串类型 - en（英语）：字符串类型 - es（西班牙语）：字符串类型 - fr（法语）：字符串类型 - it（意大利语）：字符串类型 - 名称：answers，类型：字符串序列 - 名称：words，类型：字符串序列 - 名称：bounding_boxes（边界框），类型为嵌套浮点数序列，每个子序列长度为4 - 名称：answer（匹配结果结构体），字段如下： - match_score（匹配得分）：64位浮点数 - matched_text（匹配文本）：字符串类型 - start（起始位置）：64位整数 - text（原始文本）：字符串类型 - 名称：ground_truth（真实标签）：字符串类型数据集划分： - 名称：训练集（train），字节占用：33130077.0，样本数：100 - 名称：测试集（test），字节占用：6102508.0，样本数：20 下载大小：13284640字节数据集总大小：39232585.0字节 --- # 「docvqa_50_unanswerable_questions」数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

LukeEuser

原始信息汇总

数据集概述

数据集信息

特征列表:
- id: 类型为字符串。
- image: 类型为图像。
- query: 结构化数据，包含以下字段:
  - de: 德语查询，类型为字符串。
  - en: 英语查询，类型为字符串。
  - es: 西班牙语查询，类型为字符串。
  - fr: 法语查询，类型为字符串。
  - it: 意大利语查询，类型为字符串。
- answers: 字符串序列。
- words: 字符串序列。
- bounding_boxes: 序列，包含长度为4的浮点数序列。
- answer: 结构化数据，包含以下字段:
  - match_score: 匹配分数，类型为浮点数。
  - matched_text: 匹配的文本，类型为字符串。
  - start: 起始位置，类型为整数。
  - text: 文本，类型为字符串。
- ground_truth: 类型为字符串。

数据集分割

训练集:
- 字节数: 33130077.0
- 样本数: 100
测试集:
- 字节数: 6102508.0
- 样本数: 20

数据集大小

下载大小: 13284640 字节
数据集大小: 39232585.0 字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集