KorWikiTabular, KorWikiTQ

Name: KorWikiTabular, KorWikiTQ
Creator: LG AI Research ISC
Published: 2022-05-01 20:35:19
License: 暂无描述

arXiv2022-05-01 更新2024-06-21 收录

下载链接：

https://github.com/LG-NLP/KorWikiTableQuestions

下载链接

链接失效反馈

官方服务：

资源简介：

本研究构建了两个韩语特定数据集：KorWikiTabular包含140万个表格，用于无监督预训练语言模型；KorWikiTQ包含7万个问题-答案对，由众包工作者创建。这些数据集旨在支持表格问题回答任务，通过转换表格结构为线性文本，使模型能有效学习表格结构信息。数据集应用于提高韩语环境下表格问题回答的准确性和效率，特别是在处理复杂表格结构和多样的自然语言查询时。

In this study, two Korean-specific datasets are constructed: KorWikiTabular, which comprises 1.4 million tables for unsupervised pre-training of language models, and KorWikiTQ, which contains 70,000 question-answer pairs created by crowd workers. These datasets are designed to support table question answering tasks, by converting table structures into linear text to enable models to effectively learn table structural information. They are applied to enhance the accuracy and efficiency of table question answering in Korean contexts, especially when handling complex table structures and diverse natural language queries.

提供机构：

LG AI Research ISC

创建时间：

2022-01-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集