ucsahin/pubtables-detection-1500-samples
收藏Hugging Face2024-05-25 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/ucsahin/pubtables-detection-1500-samples
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cdla-permissive-2.0
size_categories:
- 1K<n<10K
task_categories:
- object-detection
pretty_name: PubTables Dataset for Table Detection
dataset_info:
features:
- name: image
dtype: image
- name: objects
struct:
- name: bbox
sequence:
sequence: float64
- name: categories
dtype: string
splits:
- name: train
num_bytes: 228796393.0
num_examples: 1500
download_size: 224693659
dataset_size: 228796393.0
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
tags:
- Documents
- Tables
---
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
This dataset is a sampled version of [bsmock/pubtables-1m](https://huggingface.co/datasets/bsmock/pubtables-1m) consisting of 1500 images along with the PASCAL VOC formatted bounding boxes of multiple tables. It can be used for demonstration purposes in finetuning DETR and multimodal large language models.
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [bsmock/pubtables-1m] (https://huggingface.co/datasets/bsmock/pubtables-1m)
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
It can be used for finetuning DETR and TATR type models. Please check the original license in [microsoft/table-transformer](https://github.com/microsoft/table-transformer).
提供机构:
ucsahin
原始信息汇总
数据集概述
基本信息
- 名称: PubTables Dataset for Table Detection
- 语言: 英语
- 许可证: CDLA-Permissive-2.0
- 大小分类: 1K<n<10K
- 任务分类: 对象检测
数据集特征
- 特征:
- image: 图像类型
- objects: 结构化数据
- bbox: 边界框,数据类型为浮点数序列
- categories: 类别,数据类型为字符串
数据集分割
- 训练集:
- 示例数量: 1500
- 数据大小: 228796393.0 字节
- 下载大小: 224693659 字节
配置
- 默认配置:
- 数据文件:
- 分割: 训练
- 路径: data/train-*
- 数据文件:
数据集用途
- 用于微调DETR和TATR类型模型。



