five

Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0

收藏
SSH Open MarketPlace2025-07-04 更新2025-07-05 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/adnEPL
下载链接
链接失效反馈
官方服务:
资源简介:
FRENK-MMC-RTV is a dataset of moderated newspaper comments from the website [rtvslo.si](https://www.rtvslo.si/) with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well. The model is available for download from the CLARIN.SI repository.

FRENK-MMC-RTV 是源自网站[rtvslo.si](https://www.rtvslo.si/)的经审核新闻评论数据集,包含发布时间、用户标识、评论线程标识以及评论是否被审核员删除等元数据。每条评论的正文均通过字符替换法加密,使人类无法直接读取;为支持分词(Tokenization),基础标点符号未作加密处理。该数据集的核心应用为评论审核自动化相关实验。若用于实际落地场景,同步提供了基于非加密数据训练得到的fastText分类模型。 该模型可从CLARIN.SI存储库下载获取。
创建时间:
2025-07-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作