Vikhrmodels/2ch-24-09-2024
收藏Hugging Face2024-09-24 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/Vikhrmodels/2ch-24-09-2024
下载链接
链接失效反馈官方服务:
资源简介:
2ch不安全数据集是一个从俄罗斯匿名图像版2ch抓取的文本数据集合。这个数据集包含了各种板块和主题的原始、未过滤的讨论内容,涵盖了从 casual banter 到可能具有攻击性的材料等多样化的内容。数据集来源于多个板块,如科技、政治、人际关系、爱好、游戏、梗等,提供了一个广泛的在线讨论视角。由于2ch是一个匿名论坛,数据集中不包含任何个人识别信息,适合研究匿名性对在线行为和沟通模式的影响。文本以原始形式收集,包括俚语、咒骂、仇恨言论等,但已转换为聊天格式。数据集的时间覆盖范围至2024年8月。数据来自多个板块,如/random、/politics、/video games等,提供了丰富的主题讨论。
The 2ch unSafety Dataset is a comprehensive collection of text data scraped from the Russian anonymous imageboard 2ch. This dataset captures the raw, unfiltered discourse found across various boards and threads on 2ch, covering a wide range of topics, discussions, and opinions. The dataset includes a diverse array of content, from casual banter to potentially offensive material, sourced from multiple boards such as technology, politics, relationships, hobbies, gaming, memes, and more. As 2ch is an anonymous forum, the dataset does not contain any personally identifiable information, making it suitable for studying the impact of anonymity on online behavior and communication patterns. The text is collected in its raw form, including slang, profanity, hate speech, and other explicit or harmful language, but converted into chat format. The datasets time coverage is until August 2024, and it is sourced from a variety of boards, providing a rich mixture of thematic discussions.
提供机构:
Vikhrmodels



