LAGRANGE
收藏arXiv2023-09-21 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2309.11669v1
下载链接
链接失效反馈官方服务:
资源简介:
LAGRANGE是一个大规模的图文本对齐数据集,由苹果公司创建。该数据集包含从Wikidata知识图谱和Wikipedia文章中提取的配对知识图谱和文本数据,总计约300万对。数据集的创建过程涉及使用字符串匹配技术进行初步对齐,并通过语义蕴涵模型过滤低质量匹配,以提高知识图谱和文本之间的等价性。LAGRANGE数据集主要用于训练能够从知识图谱生成文本或反之的序列到序列模型,特别适用于解决自然语言处理中的问题,如问答系统。
LAGRANGE is a large-scale graph-text alignment dataset created by Apple Inc. This dataset includes approximately 3 million pairs of knowledge graph and text data extracted from Wikidata knowledge graphs and Wikipedia articles. The dataset construction process involves using string matching techniques for preliminary alignment, followed by filtering low-quality matches with a textual entailment model to improve the semantic equivalence between the paired knowledge graphs and their corresponding texts. The LAGRANGE dataset is primarily used for training sequence-to-sequence models capable of generating text from knowledge graphs, or vice versa, and is particularly suitable for addressing natural language processing (NLP) tasks such as question answering systems.
提供机构:
苹果公司
创建时间:
2023-09-21



