Balanced Voice Dataset for Toxigen Content Moderation
收藏Databricks2024-12-05 收录
下载链接:
https://marketplace.databricks.com/details/ab776051-3201-47e0-a062-3a95f691ebde/Destined_Balanced-Voice-Dataset-for-Toxigen-Content-Moderation
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**
This dataset is a meticulously curated collection of voice recordings from 500 unique people, designed to enhance content moderation systems and foster ethical AI applications. It provides balanced male and female representation speakers and diverse texts across target groups from the Toxigen dataset, ensuring inclusivity in the dataset. Key features include detailed metadata fields for nuanced analysis and model fine-tuning.
**Use Cases**
- Content Moderation Training: Improve AI systems' ability to detect and moderate toxic language with diverse, real-world examples.
- Bias Analysis and Mitigation: Analyze and address potential biases in toxicity detection systems using balanced demographic and target group data.
**Product Details**
The dataset contains the following elements:
- Voice recordings (audio files) balanced by gender and target groups.
- Metadata tables for detailed analyses, including toxicity ratings and linguistic attributes.
**Sample Fields**
row_id: Unique identifier for each record.
text: The content of the Toxigen text.
target_group: Demographic or group targeted in the text.
toxicity_ai / toxicity_human: Toxicity ratings from AI and human reviewers.
gender, age_range, native_language, region: Speaker demographic details.
voice_conditions: Details of any voice-specific conditions affecting speakers voice e.g. COPD.
audio_file_path: Path to the corresponding audio file.
**Additional Insights**
This dataset is ethically sourced and fully consented - intenional recordings from diverse speakers, ensures compliance with responsible and growth AI practices. It is invaluable for responsible AI teams, corporate AI model training, and content moderation system refinement.
For more details, please feel free to reach out to us at sales.databricks@destined.ai
提供机构:
Destined
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集包含500名不同说话者的平衡语音记录,涵盖多样化的目标群体和详细元数据,专为提升内容审核系统和促进伦理AI应用而设计。所有数据均符合伦理标准,包含性别、年龄等人口统计信息和毒性评分等关键字段。
以上内容由遇见数据集搜集并总结生成



