Name: cmarkea/table-vqa
Creator: cmarkea
Published: 2024-09-26 11:53:12
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/cmarkea/table-vqa

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - fr - en license: apache-2.0 size_categories: - 10K<n<100K task_categories: - text-generation - text-to-image - image-to-text - table-question-answering - visual-question-answering dataset_info: features: - name: id dtype: string - name: paper_id dtype: string - name: latex dtype: string - name: newcommands sequence: string - name: image dtype: image - name: model dtype: string - name: qa struct: - name: en list: - name: answer dtype: string - name: question dtype: string - name: fr list: - name: answer dtype: string - name: question dtype: string splits: - name: train num_bytes: 1277095951.0 num_examples: 16415 - name: test num_bytes: 30261292.0 num_examples: 395 download_size: 3634328121 dataset_size: 1307357243.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* tags: - arXiv - multimodal - document-type objects - table --- ## Dataset description The table-vqa Dataset integrates images of tables from the dataset [AFTdb](https://huggingface.co/datasets/cmarkea/aftdb) (Arxiv Figure Table Database) curated by cmarkea. This dataset consists of pairs of table images and corresponding LaTeX source code, with each image linked to an average of ten questions and answers. Half of the Q&A pairs are in English and the other half in French. These questions and answers were generated using Gemini 1.5 Pro and Claude 3.5 sonnet, making the dataset well-suited for multimodal tasks involving image-text pairing and multilingual question answering. ## Loading the dataset To load the dataset, first install the library `datasets` with: ```bash pip install datasets ``` Then, use the following code: ```python from datasets import load_dataset ds = load_dataset("cmarkea/table-vqa") ``` ## Data sample A sample of the data is structured as follows: ``` { 'id': '786cc06c71854b088ca098fdf2cf20fa', 'latex': '\\begin{tabular}{|r|r|r|r|}\n\\hline\n$\\sqrt{s}$ (GeV) & $\\phi$ (rad) & $\\theta_{C}$ & $\\theta_{AMH}$ \\\\ \\hline\n250 & $0.444 \\pm 0.070$ & $0.0497 \\pm 0.0051$ & $0.36 \\pm 0.10$ \\\\ \\hline\n\\end{tabular}', 'newcommands': [ '\\newcommand{\\toprule}{\\hline}', '\\newcommand{\\midrule}{\\hline}', '\\newcommand{\\bottomrule}{\\hline}' ], 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=735x70 at 0x7F2420F56550>, 'model': 'claude3.5-sonnet', 'qa': { 'en': [ { 'answer': '250 GeV', 'question': 'What is the center-of-mass energy in GeV for the data presented in the table?' }, { 'answer': '0.444 ± 0.070 rad', 'question': 'What is the value of φ (phi) in radians according to the table?' }, { 'answer': '4 significant figures', 'question': 'How many significant figures are reported for the θC (theta C) value?' }, { 'answer': 'θAMH (theta AMH) with a relative uncertainty of about 28%', 'question': 'Which parameter has the largest relative uncertainty in the table?' }, { 'answer': '0.4097 (0.0497 + 0.36)', 'question': 'What is the sum of the central values of θC and θAMH?' } ], 'fr': [ { 'answer': 'GeV (Giga-électronvolt)', 'question': 'Quelle est l'unité de mesure utilisée pour √s dans le tableau?' }, { 'answer': '0,36 ± 0,10', 'question': 'Quelle est la valeur de θAMH (theta AMH) indiquée dans le tableau?' }, { 'answer': '4 paramètres', 'question': 'Combien de paramètres sont présentés dans ce tableau?' }, { 'answer': '± 0,070 rad', 'question': 'Quelle est la précision de la mesure de φ (phi) en radians?' }, { 'answer': 'θC (theta C) avec une incertitude de ± 0,0051', 'question': 'Quel paramètre a la plus petite incertitude absolue dans le tableau?' } ] } } ``` ## Statistical Description ### Repartition by Languages (english and french) | Split | Language | # images | # Q/A pairs | # Words | |--|:--------------------:|:----------:|:-----------:|:---------:| | *train* | | | en | 16,415 | 82,342 | 1,679,891 | | | fr | 16,415 | 82,154 | 1,939,728 | | | Total | 16,415 | 164,496 | 3,619,619 | | *test* | | | en | 395 | 1,975 | 40,882 | | | fr | 395 | 1,975 | 47,297 | | | Total | 395 | 4,030 | 76,181 | ### Repartition by Model Used for the Generation | Split | Model | # images | # en Q/A pairs | # fr Q/A pairs | # total Q/A pairs | |-|----------------:|:----------:|:--------------:|:--------------:|:-----------------:| | *train* | | | Claude | 8,247 | 41,235 | 41,235 | 82,470 | | | Gemini | 8,168 | 41,107 | 40,919 | 82,026 | | *test* | | | Claude | 187 | 935 | 935 | 1,870 | | | Gemini | 208 | 1,040 | 1,040 | 2,080 | ## Field Descriptions - **id:** Unique identifier for each observation. - **image:** Pillow image of the table. - **latex:** LaTeX source code of the table. - **model:** model used to generate the question-answers pairs (`'claude3.5-sonnet'` or `'gemini-1.5-pro'`) - **paper_id:** Unique arXiv identifier for each article (the article from which the table was taken). - **newcommands:** List containing the LaTeX `newcommands` used in the article. - **qa:** Dictionary containing the pairs of questions and answers in English and French. Citation -------- ```bibtex @online{AgDeTQA, AUTHOR = {Tom Agonnoude, Cyrile Delestre}, URL = {https://huggingface.co/datasets/cmarkea/table-vqa}, YEAR = {2024}, KEYWORDS = {NLP ; Multimodal} } ```

应用场景：