five

unstpb-nlp/SeLeRoSa-proc

收藏
Hugging Face2025-06-18 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/unstpb-nlp/SeLeRoSa-proc
下载链接
链接失效反馈
官方服务:
资源简介:
SeLeRoSa是一个罗马尼亚新闻文章句级讽刺检测数据集。该数据集包含13,873个手动注释的句子,涵盖社会问题、IT、科学和电影等多个领域。数据集经过匿名化和处理,包括使用Spacy进行命名实体替换、小写转换、修正特殊字符、分词、去除停用词和标点符号、删除短词以及词形还原等步骤。数据集分为训练集、验证集和测试集,可用于训练和评估自然语言处理模型在句级讽刺检测任务上的性能。

SeLeRoSa is a Romanian news article sentence-level satire detection dataset. The dataset consists of 13,873 manually annotated sentences covering various domains such as social issues, IT, science, and movies. The dataset has been anonymized and processed, including named entity replacement with Spacy, conversion to lowercase, correction of special characters, tokenization, removal of stop words and punctuation marks, removal of short words, and lemmatization. The dataset is split into training, validation, and test sets, which can be used to train and evaluate the performance of natural language processing models on the sentence-level satire detection task.
提供机构:
unstpb-nlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作