DataMuncher-Labs/ToxicMessages

Name: DataMuncher-Labs/ToxicMessages
Creator: DataMuncher-Labs
Published: 2026-01-01 20:48:13
License: 暂无描述

Hugging Face2026-01-01 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/DataMuncher-Labs/ToxicMessages

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-sa-4.0 task_categories: - text-classification language: - en tags: - langauge - EN - en - English - english - toxic pretty_name: TM120 size_categories: - 100M<n<1B --- # **APROX ~1.44B tokens** # Dataset Card for TM120 ## Dataset Details ### Dataset Description It is a synthetically generated database of ranked toxic messages. - **Curated by:** [Roman] - **Funded by [Free]:** [Legit done for free] - **Shared by [Roman]:** - **Language(s) (NLP):** [English is the only langauge] - **License:** [Creative Commons Attribution Share Alike 4.0] ### Dataset Sources [optional] - **Demo [TBF]:** [Currently in a training loop lol] ## Uses ### Direct Use [The dataset is meant for sentence flassification and token classification] ### Out-of-Scope Use  [The data is not very good for training text generation models.] [Do not use this data to train more toxic models.] ## Dataset Structure [Message (string) | toxicity (float)] eg **I appreciate your help with this project. | 0.2808** ## Dataset Creation ### Curation Rationale [I wanted to make an ai for detecting toxicity, but there was a clear lack of data.] ### Source Data #### Data Collection and Processing [Synthetically generated via python scripting] #### Who are the source data producers? [Scripted in python] #### Personal and Sensitive Information [Since the data is synthetically generated, it contains no Personal or Sensitive information.] ## Bias, Risks, and Limitations [Any model which has been trained on this is not representative of the data it was trained on.] [The risks being since its synthetic data, it is not 100% representitive of real toxicity.] [The limitations being accuracy to real toxicity.] ### Recommendations ## Citation [optional]  **BibTeX:** @dataset{DataMuncherLabs_ToxicMessages, author = {{DataMuncher-Labs}}, title = {ToxicMessages}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/DataMuncher-Labs/ToxicMessages}, note = {Dataset for toxicity regression and classification} } **APA:** [DataMuncher-Labs. (2025). ToxicMessages [Dataset]. Hugging Face. https://huggingface.co/datasets/DataMuncher-Labs/ToxicMessages] ## Dataset Card Authors [optional] [Roman] ## Dataset Card Contact Email me at [Romanfinal@proton.me] for anyquestions you have Please do not spam my inbox Thank you in advance

提供机构：

DataMuncher-Labs

5,000+

优质数据集

54 个

任务类型

进入经典数据集