five

nyuuzyou/rutube-channels

收藏
Hugging Face2024-02-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/rutube-channels
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language: - ru language_creators: - crowdsourced license: - cc0-1.0 multilinguality: - monolingual pretty_name: Rutube channels size_categories: - 10M<100M source_datasets: - original task_categories: - text-generation task_ids: - language-modeling --- # Dataset Card for Rutube channels ### Dataset Summary This dataset was scraped from channel pages on the Russian video-sharing platform [Rutube](https://rutube.ru). It includes all information from the channel card. The dataset was collected by processing 36 million channels, starting from the first one. At the time the dataset was collected, it is assumed that these were all the channels available on this platform. Some fields may be empty, but the string is expected to contain some data, empty responses have been sorted. ### Languages The dataset is mostly in Russian, but there may be other languages present. ## Dataset Structure ### Data Fields This dataset includes the following fields: - `id`: Identifier for the channel (integer) - `name`: Name of the channel (string) - `description`: Short description of the channel (string) - `is_official`: Indicates if the channel is official (boolean) - `video_count`: Number of videos in the channel (integer) - `hits`: Number of hits or views for the channel (integer) - `subscribers_count`: Number of subscribers to the channel (integer) - `date_joined`: Date and time when the channel was created (string in ISO 8601 format) ### Data Splits All examples are in the train split, there is no validation split. ## Additional Information ### License This dataset is dedicated to the public domain under the Creative Commons Zero (CC0) license. This means you can: * Use it for any purpose, including commercial projects. * Modify it however you like. * Distribute it without asking permission. No attribution is required, but it's always appreciated! CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed.en To learn more about CC0, visit the Creative Commons website: https://creativecommons.org/publicdomain/zero/1.0/ ### Dataset Curators - [nyuuzyou](https://ducks.party)
提供机构:
nyuuzyou
原始信息汇总

数据集卡片 for Rutube channels

数据集概述

该数据集是从俄罗斯视频分享平台Rutube的频道页面抓取的。它包括频道卡片中的所有信息。数据集是通过处理3600万个频道收集的,从第一个频道开始。在数据集收集时,假设这些是该平台上所有可用的频道。某些字段可能为空,但字符串应包含某些数据,空响应已被排序。

语言

数据集主要为俄语,但可能包含其他语言。

数据集结构

数据字段

该数据集包括以下字段:

  • id: 频道的标识符(整数)
  • name: 频道名称(字符串)
  • description: 频道的简短描述(字符串)
  • is_official: 指示频道是否为官方频道(布尔值)
  • video_count: 频道中的视频数量(整数)
  • hits: 频道的点击量或观看次数(整数)
  • subscribers_count: 频道的订阅者数量(整数)
  • date_joined: 频道创建的日期和时间(ISO 8601格式的字符串)

数据分割

所有示例都在训练分割中,没有验证分割。

附加信息

许可证

该数据集根据Creative Commons Zero (CC0)许可证贡献给公共领域。这意味着您可以:

  • 将其用于任何目的,包括商业项目。
  • 随意修改。
  • 无需请求许可即可分发。

不需要署名,但总是受到欢迎!

CC0许可证:https://creativecommons.org/publicdomain/zero/1.0/deed.en

了解更多关于CC0的信息,请访问Creative Commons网站:https://creativecommons.org/publicdomain/zero/1.0/

数据集策展人

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作