five

ReadingTimeMachine/historical_dla

收藏
Hugging Face2024-03-25 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/ReadingTimeMachine/historical_dla
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en --- ## Dataset Introduction This dataset has bounding boxes for ~6000 hand annotated pages with bounding boxes for figures, figure captions, tables, and math formulas. More coverage is available for figures + captions (some pages might not have all tables and math formulas annotated). Format is JSON and includes lists (so it looks like HuggingFace doesn't necessarily like this format for display), with rows that look like: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6488a2de10af27fd305cb3dd/XdbA_YXAqNmurh-OBhvJO.png) ## How to use this data To plot an example check out the [trial_example_from_data.ipynb](https://huggingface.co/datasets/ReadingTimeMachine/historical_dla/blob/main/trial_example_from_data.ipynb) notebook. This assumes you have the data and the [data_utils.py](https://huggingface.co/datasets/ReadingTimeMachine/historical_dla/blob/main/data_utils.py) file in the same location as your notebook. The following packages will have to be installed: ```python matplotlib numpy pandas wand PIL wget cv2 # OpenCV ``` On Google Colab, to install `wand` we found we had to do the following (this is not in the linked notebook): ```python !apt install imagemagick !apt-get install libmagickwand-dev !pip install Wand !rm /etc/ImageMagick-6/policy.xml !pip install wget ```
提供机构:
ReadingTimeMachine
原始信息汇总

数据集概述

数据集内容

  • 包含约6000页的手动标注边界框,涵盖图形、图形标题、表格和数学公式。
  • 图形及其标题的标注覆盖更广,部分页面可能未标注所有表格和数学公式。

数据格式

  • 数据格式为JSON,包含列表。

使用示例

  • 提供了如何使用数据的示例,包括安装必要的Python包和在Google Colab中的特定安装步骤。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作