five

DataMuncher-Labs/ToxicMessages

收藏
Hugging Face2026-01-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/DataMuncher-Labs/ToxicMessages
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - text-classification language: - en tags: - langauge - EN - en - English - english - toxic pretty_name: TM120 size_categories: - 100M<n<1B --- # **APROX ~1.44B tokens** # Dataset Card for TM120 ## Dataset Details ### Dataset Description It is a synthetically generated database of ranked toxic messages. - **Curated by:** [Roman] - **Funded by [Free]:** [Legit done for free] - **Shared by [Roman]:** - **Language(s) (NLP):** [English is the only langauge] - **License:** [Creative Commons Attribution Share Alike 4.0] ### Dataset Sources [optional] - **Demo [TBF]:** [Currently in a training loop lol] ## Uses ### Direct Use [The dataset is meant for sentence flassification and token classification] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> [The data is not very good for training text generation models.] [Do not use this data to train more toxic models.] ## Dataset Structure [Message (string) | toxicity (float)] eg **I appreciate your help with this project. | 0.2808** ## Dataset Creation ### Curation Rationale [I wanted to make an ai for detecting toxicity, but there was a clear lack of data.] ### Source Data #### Data Collection and Processing [Synthetically generated via python scripting] #### Who are the source data producers? [Scripted in python] #### Personal and Sensitive Information [Since the data is synthetically generated, it contains no Personal or Sensitive information.] ## Bias, Risks, and Limitations [Any model which has been trained on this is not representative of the data it was trained on.] [The risks being since its synthetic data, it is not 100% representitive of real toxicity.] [The limitations being accuracy to real toxicity.] ### Recommendations ## Citation [optional] <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> **BibTeX:** @dataset{DataMuncherLabs_ToxicMessages, author = {{DataMuncher-Labs}}, title = {ToxicMessages}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/datasets/DataMuncher-Labs/ToxicMessages}, note = {Dataset for toxicity regression and classification} } **APA:** [DataMuncher-Labs. (2025). ToxicMessages [Dataset]. Hugging Face. https://huggingface.co/datasets/DataMuncher-Labs/ToxicMessages] ## Dataset Card Authors [optional] [Roman] ## Dataset Card Contact Email me at [Romanfinal@proton.me] for anyquestions you have Please do not spam my inbox Thank you in advance
提供机构:
DataMuncher-Labs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作