This is not a Dataset
收藏arXiv2023-10-24 更新2024-06-21 收录
下载链接:
https://github.com/hitz-zentroa/This-is-not-a-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
数据集'This is not a Dataset'由HiTZ中心-Ixa,巴斯克地区大学UPV/EHU创建,包含约40万条描述常识知识的句子,其中约2/3包含不同形式的否定。该数据集通过半自动生成,利用WordNet关系创建描述性句子,用于测试大型语言模型在零样本学习环境下的泛化和推理能力。数据集的应用领域主要集中在自然语言处理中对否定理解的挑战,旨在提高模型对否定句子的处理能力,尤其是在常识、因果关系、蕴含和世界知识方面的应用。
The dataset 'This is not a Dataset' was created by HiTZ Center-Ixa and the University of the Basque Country UPV/EHU. It contains approximately 400,000 sentences describing common-sense knowledge, with roughly two-thirds of them involving various forms of negation. This semi-automatically generated dataset uses WordNet relations to construct descriptive sentences, and is designed to test the generalization and reasoning capabilities of large language models in zero-shot learning scenarios. Its main application areas focus on the challenges of negation understanding in natural language processing, aiming to improve models' ability to process negative sentences, especially in scenarios related to common sense, causal relationships, textual entailment, and world knowledge.
提供机构:
HiTZ中心-Ixa,巴斯克地区大学UPV/EHU
创建时间:
2023-10-24



