iszhaoxin/MyriadLAMA
收藏Hugging Face2024-10-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/iszhaoxin/MyriadLAMA
下载链接
链接失效反馈官方服务:
资源简介:
MyriadLAMA是一个多提示事实探测数据集,为每个事实知识提供了多种提示。该数据集通过半自动方式扩展了现有的单提示探测数据集LAMA-UHN,生成了多个关系模板和不同的主题表达方式。数据集包含41种关系,每种关系有100个不同的模板,其中5个是手动创建的,95个是通过GPT-4自动生成的。数据集还提供了每个知识三元组的唯一标识符、关系标识符、模板标识符、模板文本、是否手动创建、原始模板标识符、原始模板文本、主题标识符、主题实体表达式、主题别名、对象标识符列表、对象表达式列表和对象别名列表。
MyriadLAMA is a multi-prompt factual probing dataset that provides myriad prompts for each factual knowledge. The dataset is built by semi-automatically extending the existing single-prompt probing dataset LAMA-UHN. MyriadLAMA generates multiple prompts for each fact by providing multiple, equal relational templates for each relation and varying the linguistic expressions of subjects. Additionally, MyriadLAMA offers multiple expressions for each object to cover missed facts that are correctly predicted but in different tokens. The dataset creates a great variety of relational templates through a semi-automatic process, first manually generating five distinct templates for each relation, then automatically paraphrasing each manually created template 19 times using the GPT4 API, and finally filtering all templates by human reviewers to remove low-quality templates, resulting in a total of 4100 templates covering 41 relations. The dataset includes multiple fields such as the unique identifier of the knowledge triple, the unique identifier of the relationship, the unique identifier of the relational template, the template text, whether the template is manually created, the original manual template from which the template is paraphrased, the unique identifier of the subject, the original expression of the subject entity, the list of aliases of the subject entity, the list of unique identifiers of the object, the list of object expressions, and the list of object aliases.
提供机构:
iszhaoxin



