LIE
收藏arXiv2022-07-14 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2207.06717v1
下载链接
链接失效反馈官方服务:
资源简介:
LIE数据集是由中国科学院信息工程研究所和阿里巴巴达摩院共同创建,专注于从视觉丰富的文档中提取结构和语义知识,以支持文档基础对话系统中的准确响应生成。该数据集包含4061页文档中的62k个标注,是目前最大的基于VRD的信息提取数据集。LIE数据集不仅要求系统提取特定文本片段,还涉及它们之间的关系,支持层次结构提取、节提取和关系提取三个任务。该数据集的应用领域主要集中在提升对话系统在处理复杂文档时的理解和响应能力,解决用户在交互对话中快速获取所需信息的问题。
The LIE Dataset was co-developed by the Institute of Information Engineering, Chinese Academy of Sciences and Alibaba DAMO Academy. It focuses on extracting structural and semantic knowledge from visually-rich documents (VRDs) to support accurate response generation in document-grounded dialogue systems. This dataset contains 62k annotations across 4,061 document pages, making it the largest VRD-based information extraction dataset to date. The LIE Dataset not only requires systems to extract specific text segments but also involves identifying the relationships between these segments, supporting three core tasks: hierarchical structure extraction, section extraction, and relationship extraction. The primary application scenarios of this dataset are centered on enhancing the comprehension and response capabilities of dialogue systems when processing complex documents, addressing users' needs for quickly acquiring required information during interactive conversations.
提供机构:
中国科学院信息工程研究所
创建时间:
2022-07-14



