nyuuzyou/rutube-channels
收藏Hugging Face2024-02-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/rutube-channels
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language:
- ru
language_creators:
- crowdsourced
license:
- cc0-1.0
multilinguality:
- monolingual
pretty_name: Rutube channels
size_categories:
- 10M<100M
source_datasets:
- original
task_categories:
- text-generation
task_ids:
- language-modeling
---
# Dataset Card for Rutube channels
### Dataset Summary
This dataset was scraped from channel pages on the Russian video-sharing platform [Rutube](https://rutube.ru). It includes all information from the channel card. The dataset was collected by processing 36 million channels, starting from the first one. At the time the dataset was collected, it is assumed that these were all the channels available on this platform. Some fields may be empty, but the string is expected to contain some data, empty responses have been sorted.
### Languages
The dataset is mostly in Russian, but there may be other languages present.
## Dataset Structure
### Data Fields
This dataset includes the following fields:
- `id`: Identifier for the channel (integer)
- `name`: Name of the channel (string)
- `description`: Short description of the channel (string)
- `is_official`: Indicates if the channel is official (boolean)
- `video_count`: Number of videos in the channel (integer)
- `hits`: Number of hits or views for the channel (integer)
- `subscribers_count`: Number of subscribers to the channel (integer)
- `date_joined`: Date and time when the channel was created (string in ISO 8601 format)
### Data Splits
All examples are in the train split, there is no validation split.
## Additional Information
### License
This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can:
* Use it for any purpose, including commercial projects.
* Modify it however you like.
* Distribute it without asking permission.
No attribution is required, but it's always appreciated!
CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed.en
To learn more about CC0, visit the Creative Commons website: https://creativecommons.org/publicdomain/zero/1.0/
### Dataset Curators
- [nyuuzyou](https://ducks.party)
提供机构:
nyuuzyou
原始信息汇总
数据集卡片 for Rutube channels
数据集概述
该数据集是从俄罗斯视频分享平台Rutube的频道页面抓取的。它包括频道卡片中的所有信息。数据集是通过处理3600万个频道收集的,从第一个频道开始。在数据集收集时,假设这些是该平台上所有可用的频道。某些字段可能为空,但字符串应包含某些数据,空响应已被排序。
语言
数据集主要为俄语,但可能包含其他语言。
数据集结构
数据字段
该数据集包括以下字段:
id: 频道的标识符(整数)name: 频道名称(字符串)description: 频道的简短描述(字符串)is_official: 指示频道是否为官方频道(布尔值)video_count: 频道中的视频数量(整数)hits: 频道的点击量或观看次数(整数)subscribers_count: 频道的订阅者数量(整数)date_joined: 频道创建的日期和时间(ISO 8601格式的字符串)
数据分割
所有示例都在训练分割中,没有验证分割。
附加信息
许可证
该数据集根据Creative Commons Zero (CC0)许可证贡献给公共领域。这意味着您可以:
- 将其用于任何目的,包括商业项目。
- 随意修改。
- 无需请求许可即可分发。
不需要署名,但总是受到欢迎!
CC0许可证:https://creativecommons.org/publicdomain/zero/1.0/deed.en
了解更多关于CC0的信息,请访问Creative Commons网站:https://creativecommons.org/publicdomain/zero/1.0/



