Sonnet3.5-Charcard-Roleplay
收藏魔搭社区2025-12-05 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/Gryphe/Sonnet3.5-Charcard-Roleplay
下载链接
链接失效反馈官方服务:
资源简介:
⚠️ **WARNING** ⚠️ Many of these simulated character cards are highly NSFW in nature and may potentially describe disturbing scenes. **Consider yourself very thoroughly warned!**
9736 carefully simulated character card-based roleplay dialogues produced using an unrestrained Sonnet 3.5, now available as a ShareGPT dataset. Enjoy.
## How this dataset was produced
- Each card was enriched with a simulated user, which was either male or female with four distinct personalities. An effort was made to ensure these were distributed equally.
- For the simulation itself Sonnet 3.5 was **very specifically instructed** not to resort to gratuitous lewdness but instead try its best to bring the characters to life in a realistic and engaging manner.
- A complete dialogue was requested with a single call, attempting to cover the scene as described in the card. The advantage to this method is that the model does not dissolve into incoherence due to AI-on-AI messages acting as an echo chamber of sorts.
- The resulting response went through a few dozen validations to check for formatting, dialogue length and godmodding.
- Upon assembly of the final dataset further cleaning was performed and any references to Anon (the original user) were renamed to the placeholder {{user}}.
- A final enrichment phase was applied with the most common phrases (such as "a mix/mixture of") being replaced by alternatives given by GPT-4o. Like before, this process too was carefully curated to keep the potential for "slop" low.
## Credits
All those wonderful roleplay character card creators for making this dataset possible. Some of you made me feel emotional, and some of you scarred me for life. I have no regrets either way.
## Feedback
If you find any strange formatting errors, please let me know! I'll see what I can do to fix them.
⚠️ **警告** ⚠️ 本数据集内含大量NSFW(Not Safe For Work,不适宜公开浏览)级别的模拟角色卡内容,且可能涉及令人不适的场景。请务必充分知悉相关风险!
本数据集包含9736条基于精心打磨的模拟角色扮演角色卡的对话文本,由无限制版本的Sonnet 3.5生成,现已以ShareGPT数据集格式公开。敬请取用。
### 数据集制作流程
1. 每张角色卡均配套一名模拟用户,该用户分为男、女两种性别,且具备四种差异化人格特质,制作过程中力求两类性别与四种人格的分布保持均衡。
2. 在生成对话的过程中,我们对Sonnet 3.5下达了明确指令:禁止使用无端的低俗色情内容,转而以写实且富有沉浸感的方式塑造角色。
3. 我们通过单次模型调用即可生成完整对话,力求覆盖角色卡中描述的全部场景。该方法的优势在于,能够避免因AI间的对话形成回音室效应,导致模型输出出现逻辑断裂与连贯性缺失的问题。
4. 生成的回复需经过数十轮校验,以检查格式合规性、对话长度合理性以及是否存在不当控场(godmodding)行为。
5. 在最终数据集整合阶段,我们还进行了进一步的清洗工作,将所有指向原用户Anon的引用替换为占位符`{{user}}`。
6. 最后一道优化环节中,我们使用GPT-4o将高频短语(如"a mix/mixture of")替换为多样化的同义表达,且该流程同样经过严格筛选,以尽可能降低生成低质量冗余内容(slop)的可能性。
### 致谢
感谢所有出色的角色扮演角色卡创作者,是你们的作品让本数据集得以问世。部分创作令我深受触动,也有一些让我印象深刻到难以忘怀。无论如何,我从未后悔制作这个数据集。
### 反馈
若您发现任何格式异常问题,请随时告知!我会尽力排查修复。
提供机构:
maas
创建时间:
2025-03-27



