nyuuzyou/bordaru-posts
收藏Hugging Face2024-08-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/bordaru-posts
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含从Borda.ru网站抓取的帖子,主要语言为俄语,包含5,251,346条独特的消息。数据集经过去重处理,保留了高质量的帖子。数据集的字段包括帖子URL、作者用户名和帖子内容。所有数据都在训练集分割中,没有验证集。数据集的许可证是CC0,允许任何用途的使用。
This dataset contains posts scraped from Borda.ru, a Russian platform for hosting various discussion forums on a wide range of topics. The dataset includes 5,251,346 unique messages, deduplicated to keep only high-quality posts. The fields include post URL, author username, and post content. All data is in the train split, with no validation split. The dataset is licensed under CC0, allowing for any use.
提供机构:
nyuuzyou



