SpaDeLeF
收藏arXiv2023-11-08 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2311.04189v1
下载链接
链接失效反馈官方服务:
资源简介:
SpaDeLeF数据集由国立理工学院的研究人员创建,专注于西班牙语中的词汇功能层次分类。该数据集包含957个最常见的西班牙语动词-名词搭配及其在句子中的出现情况,每个搭配被分配给37个定义为层次分类任务的词汇功能之一。数据集通过依赖树解析和西班牙新闻中的短语匹配创建,提供了每个分类目标的基准和数据分割。该数据集的应用领域包括自然语言处理任务,如机器翻译和情感分析,旨在提高语言模型对文本的理解和性能。
The SpaDeLeF dataset was developed by researchers at the National Polytechnic Institute, focusing on lexical functional hierarchical classification in Spanish. It contains 957 of the most common Spanish verb-noun collocations and their occurrences in sentences, with each collocation assigned to one of the 37 lexical functions defined for the hierarchical classification task. The dataset is constructed via dependency tree parsing and phrase matching from Spanish news corpora, and provides benchmarks and data splits for each classification target. Its application scenarios cover natural language processing tasks such as machine translation and sentiment analysis, aiming to improve the text understanding ability and overall performance of language models.
提供机构:
国立理工学院
创建时间:
2023-11-08



