SherLIiC
收藏arXiv2019-06-04 更新2024-06-21 收录
下载链接:
https://github.com/mnschmit/SherLIiC
下载链接
链接失效反馈官方服务:
资源简介:
SherLIiC是由慕尼黑大学信息与语言处理中心创建的一个专注于上下文词汇推理的测试平台,包含3985个手动标注的推理规则候选(InfCands)。该数据集还附带约960,000个未标注的InfCands和约190,000个从大型实体链接语料库ClueWeb09中提取的Freebase实体之间的类型文本关系。SherLIiC通过基于强分布证据的候选选择过程,比现有测试平台更具挑战性,因为分布证据在InfCands分类中的效用有限。此外,SherLIiC的许多正确InfCands是新颖的,且在现有规则库中缺失。该数据集适用于评估从语义向量空间模型到自然语言推理(NLI)的最新神经模型的多种强基线。
SherLIiC is a testbed dedicated to contextual lexical inference, developed by the Center for Information and Language Processing at Ludwig-Maximilians-Universität München (LMU Munich). It contains 3,985 manually annotated inference rule candidates (InfCands). The dataset also includes approximately 960,000 unannotated InfCands, as well as roughly 190,000 typed textual relations between Freebase entities extracted from the large entity-linking corpus ClueWeb09. Compared to existing testbeds, SherLIiC is more challenging, as its candidate selection process leverages strong distributional evidence, which has limited utility for InfCands classification. Furthermore, a large number of correct InfCands within SherLIiC are novel and absent from current rule repositories. This dataset is applicable for evaluating a range of strong baselines spanning from semantic vector space models to state-of-the-art neural models for natural language inference (NLI).
提供机构:
慕尼黑大学信息与语言处理中心
创建时间:
2019-06-04



