unstpb-nlp/SeLeRoSa-proc

Name: unstpb-nlp/SeLeRoSa-proc
Creator: unstpb-nlp
Published: 2025-06-18 13:55:18
License: 暂无描述

Hugging Face2025-06-18 更新2025-11-01 收录

下载链接：

https://hf-mirror.com/datasets/unstpb-nlp/SeLeRoSa-proc

下载链接

链接失效反馈

官方服务：

资源简介：

SeLeRoSa是一个罗马尼亚新闻文章句级讽刺检测数据集。该数据集包含13,873个手动注释的句子，涵盖社会问题、IT、科学和电影等多个领域。数据集经过匿名化和处理，包括使用Spacy进行命名实体替换、小写转换、修正特殊字符、分词、去除停用词和标点符号、删除短词以及词形还原等步骤。数据集分为训练集、验证集和测试集，可用于训练和评估自然语言处理模型在句级讽刺检测任务上的性能。

SeLeRoSa is a Romanian news article sentence-level satire detection dataset. The dataset consists of 13,873 manually annotated sentences covering various domains such as social issues, IT, science, and movies. The dataset has been anonymized and processed, including named entity replacement with Spacy, conversion to lowercase, correction of special characters, tokenization, removal of stop words and punctuation marks, removal of short words, and lemmatization. The dataset is split into training, validation, and test sets, which can be used to train and evaluate the performance of natural language processing models on the sentence-level satire detection task.

提供机构：

unstpb-nlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集