marmarg2/toxic-teenage-relationships
收藏Hugging Face2023-08-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/marmarg2/toxic-teenage-relationships
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-classification
language:
- es
size_categories:
- n<1K
pretty_name: Detecting toxic and healthy adolescent relationships
---
# Dataset Card for Dataset toxic-teenage-relationships
## Dataset Description
- **Homepage:**
- **Repository:**
- **Paper:**
- **Leaderboard:**
- **Point of Contact:** mmartinevqh@alumnos.unex.es
### Dataset Summary
This dataset is prototypes collected by Spanish adolescents (4 girls and 4 boys) aged 15-19 years with previous training on toxic relationships. For 2 weeks, this group of people analyzed phrases that had occurred in their environment or that they produced themselves, classifying them as toxic or healthy and collecting them through a form.
### Supported Tasks and Leaderboards
This dataset supported text-classification
### Languages
The sentences are in Spanish.
## Dataset Structure
### Data Instances
A data point consists of a comment followed by a label that is associated with it. {'label': 0,'text': 'Sample comment text', }
### Data Fields
label: value of 0(non-toxic) or 1(toxic) classifying the comment
text: the text of the comment
### Data Splits
The data is split into a training and testing set.
## Dataset Creation
### Curation Rationale
The dataset was created to help in efforts to identify and curb instances of toxicity between teenagers.
### Source Data
#### Initial Data Collection and Normalization
This dataset is prototypes collected by me thanks to my group of students (4 girls and 4 boys) aged 15-19 with previous training on toxic relationships. For 2 weeks, this group of teenagers analysed phrases that had occurred in their environment (social media, direct communication) or that they themselves produced, classifying them as toxic or healthy and collecting them through a form.
Afterwards, the examples given by each student were discussed and evaluated by the others, using peer evaluation. The classification was also ratified by two specialists in the field.
### Personal and Sensitive Information
No personal or sensitive information have been scored in this dataset.
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
If words associated with swearing, insults or profanity appear in a comment, it is likely to be classified as toxic, regardless of the author's tone or intention, e.g. humorous/self-critical. This could present some bias towards already vulnerable minority groups.
### Licensing Information
Creative Commons Attribution-NonCommercial-ShareAlike (CC-BY-NC-SA)
提供机构:
marmarg2
原始信息汇总
数据集概述
数据集基本信息
- 任务类别:文本分类
- 语言:西班牙语
- 数据集大小:小于1千条记录
- 数据集名称:Detecting toxic and healthy adolescent relationships
数据集描述
- 数据集摘要:该数据集由8名西班牙青少年(4名女孩和4名男孩)收集,年龄在15至19岁之间,他们接受过毒性关系培训。在两周内,这些青少年分析了他们环境中或自己产生的短语,将其分类为有毒或健康,并通过表格收集。
- 支持的任务:文本分类
- 语言:西班牙语
数据集结构
- 数据实例:每个数据点包含一个评论及其关联的标签。格式为{label: 0, text: Sample comment text}。
- 数据字段:
- label:0(非有毒)或1(有毒),用于分类评论
- text:评论的文本内容
- 数据分割:数据分为训练集和测试集。
数据集创建
- 数据收集与规范化:数据集由一组8名青少年收集,他们分析了环境中的短语或自己产生的短语,并将其分类为有毒或健康。之后,通过同伴评估和专家确认对分类进行了讨论和评估。
- 个人和敏感信息:数据集中未包含个人或敏感信息。
使用数据集的考虑
- 社会影响:[需要更多信息]
- 偏见讨论:如果评论中出现与诅咒、侮辱或亵渎相关的词汇,则可能被分类为有毒,这可能对某些弱势群体产生偏见。
- 许可证信息:Creative Commons Attribution-NonCommercial-ShareAlike (CC-BY-NC-SA)



