ibm-research/AITQARetrieval

Name: ibm-research/AITQARetrieval
Creator: ibm-research
Published: 2026-03-10 11:10:49
License: 暂无描述

Hugging Face2026-03-10 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/ibm-research/AITQARetrieval

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - derived language: - eng license: other license_name: aitqa-license license_link: >- https://github.com/IBM/AITQA/blob/master/LICENSE multilinguality: monolingual task_categories: - text-retrieval task_ids: - document-retrieval tags: - table-retrieval - text pretty_name: AIT-QA config_names: - default - queries - corpus dataset_info: - config_name: default features: - name: qid dtype: string - name: did dtype: string - name: score dtype: int32 splits: - name: test num_bytes: 91137 num_examples: 1533 - config_name: queries features: - name: _id dtype: string - name: text dtype: string - name: answers sequence: string - name: type dtype: string - name: row_hierarchy_needed dtype: string - name: paraphrase_group dtype: string splits: - name: test_queries num_bytes: 52874 num_examples: 515 - config_name: corpus features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: corpus num_bytes: 13506199 num_examples: 1937 configs: - config_name: default data_files: - split: test path: test_qrels.jsonl - config_name: queries data_files: - split: test_queries path: test_queries.jsonl - config_name: corpus data_files: - split: corpus path: corpus.jsonl --- # AIT-QA Retrieval This dataset is part of a Table + Text retrieval benchmark. Includes queries and relevance judgments across test split(s), with corpus in 1 format(s): `corpus`. ## Configs | Config | Description | Split(s) | |---|---|---| | `default` | Relevance judgments (qrels): `qid`, `did`, `score` | `test` | | `queries` | Query IDs, text, answers, type, row hierarchy flag, and paraphrase group | `test_queries` | | `corpus` | Plain text corpus: `_id`, `title`, `text` | `corpus` | ## TableIR Benchmark Statistics | Dataset | Structured | #Train | #Dev | #Test | #Corpus | |---|:---:|---:|---:|---:|---:| | OpenWikiTables | ✓ | 53.8k | 6.6k | 6.6k | 24.7k | | NQTables | ✓ | 9.6k | 1.1k | 1k | 170k | | FeTaQA | ✓ | 7.3k | 1k | 2k | 10.3k | | OTT-QA (small) | ✓ | 41.5k | 2.2k | -- | 8.8k | | MultiHierTT | ✗ | -- | 929 | -- | 9.9k | | AIT-QA | ✗ | -- | -- | 515 | 1.9k | | StatcanRetrieval | ✗ | -- | -- | 870 | 5.9k | | watsonxDocsQA | ✗ | -- | -- | 30 | 1.1k | ## Citation If you use **TableIR Eval: Table-Text IR Evaluation Collection**, please cite: ```bibtex @misc{doshi2026tableir, title = {TableIR Eval: Table-Text IR Evaluation Collection}, author = {Doshi, Meet and Boni, Odellia and Kumar, Vishwajeet and Sen, Jaydeep and Joshi, Sachindra}, year = {2026}, institution = {IBM Research}, howpublished = {https://huggingface.co/collections/ibm-research/table-text-ir-evaluation}, note = {Hugging Face dataset collection} } ``` All credit goes to original authors. Please cite their work: ```bibtex @misc{katsis2021aitqa, title={AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry}, author={Yannis Katsis and Saneem Chemmengath and Vishwajeet Kumar and Samarth Bharadwaj and Mustafa Canim and Michael Glass and Alfio Gliozzo and Feifei Pan and Jaydeep Sen and Karthik Sankaranarayanan and Soumen Chakrabarti}, year={2021}, eprint={2106.12944}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

ibm-research

5,000+

优质数据集

54 个

任务类型

进入经典数据集