five

avramandrei/RoIt-XMASA

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/avramandrei/RoIt-XMASA
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ro - it license: cc-by-4.0 task_categories: - text-classification task_ids: - sentiment-classification pretty_name: RoIt-XMASA size_categories: - 100K<n<1M tags: - sentiment-analysis - cross-lingual - cross-domain - romanian - italian - reviews --- # RoIt-XMASA **Romanian–Italian Cross-domain Multi-domain Sentiment Analysis** RoIt-XMASA is a multilingual, cross-domain sentiment analysis dataset containing user reviews in Romanian (RO) and Italian (IT) across three domains: Books, Movies, and Music. Reviews are annotated with 1–5 star ratings (excluding 3). ## Dataset Summary | Split | Rows | Labeled | |------------|---------|---------| | train | 12,000 | yes | | validation | 12,000 | yes | | test | 12,000 | yes | | unlabeled | 202,141 | no | ### Splits breakdown (labeled) Each labeled split is perfectly balanced: - **Domains**: 4,000 reviews per domain (Books / Movies / Music) - **Languages**: 6,000 reviews per language (RO / IT) ## Data Fields | Field | Type | Description | |------------|--------|---------------------------------------------------| | `id` | int64 | Unique review identifier | | `title` | string | Review title (may be empty) | | `text` | string | Review body | | `domain` | string | Domain: `Books`, `Movies`, or `Music` | | `language` | string | Language code: `RO` (Romanian) or `IT` (Italian) | | `rating` | int64 | Star rating: 1, 2, 4, or 5 (null for unlabeled) | ## Usage ```python from datasets import load_dataset ds = load_dataset("avramandrei/RoIt-XMASA") # Labeled splits train = ds["train"] val = ds["validation"] test = ds["test"] # Filter by language or domain ro_books = train.filter(lambda x: x["language"] == "RO" and x["domain"] == "Books") ```
提供机构:
avramandrei
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作