自动生成的NLI数据集
收藏arXiv2024-02-23 更新2024-06-21 收录
下载链接:
https://github.com/lamsoma/Auto_NLI
下载链接
链接失效反馈官方服务:
资源简介:
本研究中,名古屋大学信息科学系的研究团队开发了一个自动生成的NLI(自然语言推理)数据集,用于改进无监督环境下的句子嵌入学习。该数据集通过大型语言模型自动生成,包含前提和假设的句子对,并标注了'蕴涵'、'中立'或'矛盾'关系。数据集的创建过程涉及使用特定的提示模板,通过语言模型生成逻辑上蕴涵或矛盾的句子。此数据集主要用于训练和微调如PromptEOL等模型,以提高其在语义文本相似性(STS)任务中的表现,解决大规模手动标注数据集的依赖问题。
In this study, a research team from the Department of Informatics, Nagoya University, developed an automatically generated NLI (Natural Language Inference) dataset to enhance sentence embedding learning in unsupervised environments. This dataset is automatically generated using large language models, comprising premise-hypothesis sentence pairs annotated with "entailment", "neutral" or "contradiction" relationships. The dataset creation process employs specific prompt templates to generate logically entailed or contradictory sentences via language models. This dataset is primarily utilized for training and fine-tuning models such as PromptEOL, to improve their performance on Semantic Textual Similarity (STS) tasks, and to address the reliance on large-scale manually annotated datasets.
提供机构:
名古屋大学信息科学系
创建时间:
2024-02-23



