scaredmeow/shopee-reviews-tl-stars
收藏Hugging Face2023-05-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/scaredmeow/shopee-reviews-tl-stars
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mpl-2.0
task_categories:
- text-classification
language:
- tl
size_categories:
- 1K<n<10K
dataset_info:
features:
- name: label
dtype:
class_label:
names:
'0': 1 star
'1': 2 star
'2': 3 stars
'3': 4 stars
'4': 5 stars
- name: text
dtype: string
tags:
- reviews
- shopee
---
# Dataset Card for Dataset Name
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:** [Enhancement to Low Resource Text Classification via Sequential Transfer Learning](#)
- **Leaderboard:**
- **Point of Contact:** [Neil Riego](mailto:neilchristianriego3@gmail.com)
### Dataset Summary
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
Tagalog (TL)
## Dataset Structure
### Data Instances
A typical data point, comprises of a text and the corresponding label.
An example from the YelpReviewFull test set looks as follows:
```
{
'label': 2,
'text': 'Madaling masira yung sa may sinisintasan nya. Wala rin syang box. Sana mas ginawa pa na matibay para sana sulit yung pagkakabili'
}
```
### Data Fields
- 'text': The review texts are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes ("").
- 'label': Corresponds to the score associated with the review (between 1 and 5).
### Data Splits
The Shopee reviews tl 15 dataset is constructed by randomly taking 2100 training samples and 450 samples for testing and validation for each review star from 1 to 5.
In total there are 10500 trainig samples and 2250 each in validation and testing samples.
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
scaredmeow
原始信息汇总
数据集概述
数据集描述
- 语言: Tagalog (TL)
- 任务类别: 文本分类
- 数据集大小: 1K<n<10K
- 许可证: mpl-2.0
数据集结构
数据实例
- 字段:
label: 标签,表示评论的星级,范围从1星到5星。text: 评论文本,使用双引号进行转义,内部的双引号使用两个双引号表示。
数据分割
- 训练集: 10500样本
- 验证集: 2250样本
- 测试集: 2250样本
数据字段详情
label:- 类型: 类别标签
- 名称:
- 0: 1星
- 1: 2星
- 2: 3星
- 3: 4星
- 4: 5星
text:- 类型: 字符串
标签详情
- 标签范围: 1星至5星
- 每个星级的标签值:
- 1星: 0
- 2星: 1
- 3星: 2
- 4星: 3
- 5星: 4



