five

psytechlab/RuSentiTweet

收藏
Hugging Face2025-12-06 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/psytechlab/RuSentiTweet
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string - name: label dtype: string - name: id dtype: int64 splits: - name: train num_bytes: 1148348 num_examples: 9641 - name: test num_bytes: 317153 num_examples: 2679 - name: val num_bytes: 125692 num_examples: 1072 download_size: 1048892 dataset_size: 1591193 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: val path: data/val-* task_categories: - text-classification language: - ru tags: - russian - sentiment_analysis - social_media size_categories: - 1K<n<10K --- ## Disclaimer This is a reupload of the dataset, which is originally stored on [GitHub](https://github.com/sismetanin/rusentitweet). We only added a `val` split that is 10% of the original `train` split for a convenience. Next the original description is following. # RuSentiTweet: A Sentiment Analysis Dataset of General Domain Tweets in Russian This repository contains RuSentiTweet, a sentiment analysis dataset of 13,392 general domain tweets in Russian, which were created within the paper ["RuSentiTweet: A Sentiment Analysis Dataset of General Domain Tweets in Russian"](https://doi.org/10.7717/peerj-cs.1039). RuSentiTweet was manually annotated (moderate inter-rater agreement) using [RuSentiment](https://aclanthology.org/C18-1064/) guidelines into 5 classes: Positive, Neutral, Negative, Speech Act, and Skip. As a source of data, we used [Twitter Stream Grab](https://archive.org/details/twitterstream), a historical collection of tweets obtained from the general Twitter API stream. Citation: ``` @article{smetanin2022rusetitweet, title = {RuSentiTweet: A Sentiment Analysis Dataset of General Domain Tweets in Russian}, author = {Sergey Smetanin}, journal = {PeerJ Computer Science}, volume = {8}, pages = {e1039}, year = {2022}, doi = {10.7717/peerj-cs.1039}, publisher = {PeerJ Inc.} } ```
提供机构:
psytechlab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作