KorWikiTabular, KorWikiTQ
收藏arXiv2022-05-01 更新2024-06-21 收录
下载链接:
https://github.com/LG-NLP/KorWikiTableQuestions
下载链接
链接失效反馈官方服务:
资源简介:
本研究构建了两个韩语特定数据集:KorWikiTabular包含140万个表格,用于无监督预训练语言模型;KorWikiTQ包含7万个问题-答案对,由众包工作者创建。这些数据集旨在支持表格问题回答任务,通过转换表格结构为线性文本,使模型能有效学习表格结构信息。数据集应用于提高韩语环境下表格问题回答的准确性和效率,特别是在处理复杂表格结构和多样的自然语言查询时。
In this study, two Korean-specific datasets are constructed: KorWikiTabular, which comprises 1.4 million tables for unsupervised pre-training of language models, and KorWikiTQ, which contains 70,000 question-answer pairs created by crowd workers. These datasets are designed to support table question answering tasks, by converting table structures into linear text to enable models to effectively learn table structural information. They are applied to enhance the accuracy and efficiency of table question answering in Korean contexts, especially when handling complex table structures and diverse natural language queries.
提供机构:
LG AI Research ISC
创建时间:
2022-01-17



