five

raoanmol/ViTaB-A

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/raoanmol/ViTaB-A
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 pretty_name: ViTaB-A task_categories: - question-answering tags: - table-question-answering configs: - config_name: hitab data_files: - split: train path: hitab/train.jsonl - split: validation path: hitab/validation.jsonl - split: test path: hitab/test.jsonl - config_name: fetaqa data_files: - split: train path: fetaqa/train.jsonl - split: validation path: fetaqa/validation.jsonl - split: test path: fetaqa/test.jsonl --- # ViTaB-A Dataset A normalized table question answering dataset for the ViTaB-A research project. ## Configs - **hitab**: Derived from [HiTab](https://huggingface.co/datasets/kasnerz/hitab) (10,670 samples) - **fetaqa**: Derived from [FeTaQA](https://huggingface.co/datasets/DongfuJiang/FeTaQA) (10,330 samples) ## Usage ```python from datasets import load_dataset hitab = load_dataset("raoanmol/ViTaB-A", "hitab") fetaqa = load_dataset("raoanmol/ViTaB-A", "fetaqa") ``` ## Schema Each sample contains: | Field | Type | Description | |---|---|---| | `id` | string | Unique identifier (e.g. `vitaba_000001_hitab`) | | `split` | string | Dataset split (train/validation/test) | | `question` | string | Natural language question about the table | | `answer` | list or string | Answer (list for HiTab, string for FeTaQA) | | `citation` | list[str] | Excel-style cell references (e.g. `["=E7"]`) | | `table_json` | dict | Simplified table with keys: `title` (string), `header` (list of header rows), `rows` (list of data rows) | | `table_md` | string | Markdown representation of the table with Excel-style row/column labels | | `table_images` | dict | Table images as base64 PNGs. Keys: `arial`, `times_new_roman`, `red`, `blue`, `green`. Unrendered variants are empty strings. | | `source` | string | Source dataset and split (e.g. `hitab_train`) | | `source_id` | string | Original ID from source dataset |
提供机构:
raoanmol
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作