NyanNyanovich/nyan_clusters
收藏Hugging Face2023-12-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/NyanNyanovich/nyan_clusters
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: docs
sequence: string
- name: message_id
dtype: float64
- name: create_time
dtype: float64
- name: annotation_doc
dtype: string
- name: first_doc
dtype: string
- name: hash
dtype: string
- name: is_important
dtype: bool
- name: clid
dtype: float64
- name: message
struct:
- name: from_discussion
dtype: bool
- name: issue
dtype: string
- name: message_id
dtype: int64
- name: messages
list:
- name: from_discussion
dtype: bool
- name: issue
dtype: string
- name: message_id
dtype: int64
splits:
- name: train
num_bytes: 21796882
num_examples: 41296
download_size: 6836100
dataset_size: 21796882
license: cc-by-4.0
language:
- ru
pretty_name: Nyan Clusters
size_categories:
- 10K<n<100K
---
# Nyan clusters
Clusters of documents formed in [НЯН](https://t.me/nyannews) Telegram channel from March 2022 to December 2023. You can use the documents dataset to get texts.
## Usage
```bash
pip3 install datasets
```
```python
from datasets import load_dataset
for row in load_dataset("NyanNyanovich/nyan_clusters", split="train", streaming=True):
print(row)
break
```
## Other datasets
* Documents: https://huggingface.co/datasets/NyanNyanovich/nyan_documents
* Clusters (this dataset): https://huggingface.co/datasets/NyanNyanovich/nyan_clusters
提供机构:
NyanNyanovich
原始信息汇总
数据集概述
数据集信息
-
特征列表:
docs: 字符串序列message_id: 浮点数 (float64)create_time: 浮点数 (float64)annotation_doc: 字符串first_doc: 字符串hash: 字符串is_important: 布尔值 (bool)clid: 浮点数 (float64)message: 结构体from_discussion: 布尔值 (bool)issue: 字符串message_id: 整数 (int64)
messages: 列表from_discussion: 布尔值 (bool)issue: 字符串message_id: 整数 (int64)
-
数据分割:
train:- 字节数: 21796882
- 样本数: 41296
-
下载大小: 6836100 字节
-
数据集大小: 21796882 字节
-
许可: cc-by-4.0
-
语言: 俄语 (ru)
-
名称: Nyan Clusters
-
大小类别: 10K<n<100K



