jcblaise/dengue_filipino
收藏Hugging Face2024-02-01 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/jcblaise/dengue_filipino
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
- machine-generated
language_creators:
- crowdsourced
language:
- tl
license:
- unknown
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- multi-class-classification
paperswithcode_id: dengue
pretty_name: Dengue Dataset in Filipino
dataset_info:
features:
- name: text
dtype: string
- name: absent
dtype:
class_label:
names:
'0': '0'
'1': '1'
- name: dengue
dtype:
class_label:
names:
'0': '0'
'1': '1'
- name: health
dtype:
class_label:
names:
'0': '0'
'1': '1'
- name: mosquito
dtype:
class_label:
names:
'0': '0'
'1': '1'
- name: sick
dtype:
class_label:
names:
'0': '0'
'1': '1'
splits:
- name: train
num_bytes: 428549
num_examples: 4015
- name: test
num_bytes: 57364
num_examples: 500
- name: validation
num_bytes: 54380
num_examples: 500
download_size: 156014
dataset_size: 540293
---
# Dataset Card for Dengue Dataset in Filipino
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [Dengue Dataset in Filipino homepage](https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks)
- **Repository:** [Dengue Dataset in Filipino repository](https://github.com/jcblaisecruz02/Filipino-Text-Benchmarks)
- **Paper:** [IEEE paper](https://ieeexplore.ieee.org/document/8459963)
- **Leaderboard:**
- **Point of Contact:** [Jan Christian Cruz](mailto:jan_christian_cruz@dlsu.edu.ph)
### Dataset Summary
Benchmark dataset for low-resource multiclass classification, with 4,015 training, 500 testing, and 500 validation examples, each labeled as part of five classes. Each sample can be a part of multiple classes. Collected as tweets.
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
The dataset is primarily in Filipino, with the addition of some English words commonly used in Filipino vernacular.
## Dataset Structure
### Data Instances
Sample data:
```
{
"text": "Tapos ang dami pang lamok.",
"absent": "0",
"dengue": "0",
"health": "0",
"mosquito": "1",
"sick": "0"
}
```
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[Jan Christian Cruz](mailto:jan_christian_cruz@dlsu.edu.ph)
### Licensing Information
[More Information Needed]
### Citation Information
@INPROCEEDINGS{8459963,
author={E. D. {Livelo} and C. {Cheng}},
booktitle={2018 IEEE International Conference on Agents (ICA)},
title={Intelligent Dengue Infoveillance Using Gated Recurrent Neural Learning and Cross-Label Frequencies},
year={2018},
volume={},
number={},
pages={2-7},
doi={10.1109/AGENTS.2018.8459963}}
}
### Contributions
Thanks to [@anaerobeth](https://github.com/anaerobeth) for adding this dataset.
提供机构:
jcblaise
原始信息汇总
数据集概述
数据集名称
- 名称: Dengue Dataset in Filipino
- 别名: 登革热数据集(菲律宾语)
数据集基本信息
- 语言: 菲律宾语(tl),包含部分常用英语词汇
- 许可证: 未知
- 多语言性: 单语种
- 大小: 1K<n<10K
- 来源: 原创数据集
- 任务类别: 文本分类
- 任务ID: 多类别分类
数据集特征
- 特征列表:
- text: 文本,数据类型为字符串
- absent: 分类标签,数据类型为类别标签,标签名称为0和1
- dengue: 分类标签,数据类型为类别标签,标签名称为0和1
- health: 分类标签,数据类型为类别标签,标签名称为0和1
- mosquito: 分类标签,数据类型为类别标签,标签名称为0和1
- sick: 分类标签,数据类型为类别标签,标签名称为0和1
数据集分割
- 训练集: 4015个样本,428549字节
- 测试集: 500个样本,57364字节
- 验证集: 500个样本,54380字节
- 总下载大小: 156014字节
- 数据集总大小: 540293字节
数据集创建
- 标注创建者: 众包和机器生成
- 语言创建者: 众包
引用信息
@INPROCEEDINGS{8459963, author={E. D. {Livelo} and C. {Cheng}}, booktitle={2018 IEEE International Conference on Agents (ICA)}, title={Intelligent Dengue Infoveillance Using Gated Recurrent Neural Learning and Cross-Label Frequencies}, year={2018}, volume={}, number={}, pages={2-7}, doi={10.1109/AGENTS.2018.8459963} }



