Reddit_Dirty_Writing_Prompts_ShareGPT

Name: Reddit_Dirty_Writing_Prompts_ShareGPT
Creator: maas
Published: 2025-12-05 16:57:03
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/SicariusSicariiStuff/Reddit_Dirty_Writing_Prompts_ShareGPT

下载链接

链接失效反馈

官方服务：

资源简介：

<div align="center"> <b style="font-size: 40px;">Reddit_Dirty_Writing_Prompts_ShareGPT</b> </div> # Dataset Details This is the Reddit_Dirty_Writing_Prompts dataset, which I further cleaned and sorted. The original dataset can be found at: https://huggingface.co/datasets/nothingiisreal/Reddit-Dirty-And-WritingPrompts - Additional meticulously cleaning performed - ShareGPT Format - Each entry contains the number of tokens in both LLAMA1 and LLAMA3 tokenizers - Each entry contains the number of characters - Longest entry in tokens: TOKENS_LLAMA1: 7545 \ TOKENS_LLAMA3: 6397 - Shortest entry in tokens: TOKENS_LLAMA1: 98 \ TOKENS_LLAMA3: 83 - Total_TOKENS_LLAMA1: 12874614 (12M), Total_TOKENS_LLAMA3: 10892913 (10M) I hope this helps as many people as possible, let's make AI with less slop, and make AI accessible for everyone 🤗

# Reddit低俗写作提示ShareGPT数据集（Reddit_Dirty_Writing_Prompts_ShareGPT） ## 数据集详情本数据集为经进一步精细化清洗与整理后的Reddit低俗写作提示（Reddit_Dirty_Writing_Prompts）数据集。原始数据集可于以下地址获取：https://huggingface.co/datasets/nothingiisreal/Reddit-Dirty-And-WritingPrompts - 已执行精细化清洗操作 - 采用ShareGPT格式 - 每条数据均包含LLAMA1与LLAMA3分词器下的Token（Token）数量 - 每条数据均包含字符数统计 - 分词最长条目：LLAMA1分词计数：7545，LLAMA3分词计数：6397 - 分词最短条目：LLAMA1分词计数：98，LLAMA3分词计数：83 - LLAMA1总分词数：12874614（约1200万），LLAMA3总分词数：10892913（约1000万）衷心希望本数据集能惠及更多用户，让我们携手打造更优质的人工智能，推动人工智能的全民普惠 🤗

提供机构：

maas

创建时间：

2025-11-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集