Japanese-Roleplay
收藏魔搭社区2025-07-24 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/OmniAICreator/Japanese-Roleplay
下载链接
链接失效反馈官方服务:
资源简介:
# Japanese-Roleplay
This is a dialogue corpus collected from Japanese role-playing forum (commonly known as "なりきりチャット(narikiri chat)"). Each record corresponds to a single thread.
The following filtering and cleaning conditions have been applied:
- For all `post_content` in the `posts` of each record, remove response anchors.
- For all `post_content` in the `posts` of each record, delete posts where the `post_content` length is 10 characters or less.
- If the number of unique `poster` types in the `posts` of each record is 1 or less, delete the entire record.
- For the `posts` in each record, if the same `poster` appears consecutively, concatenate their `post_content` with line breaks and convert it into new data.
- If the number of unique `post_content` in the `posts` converted by the above processing is 10 or less, delete the entire record.
- If the `first_poster` is not present among the `poster` in the subsequent posts, delete the entire record.
Not all dialogues are purely role-playing. Some records include initial discussions about the settings, or they may continue from other threads.
# 日语角色扮演语料库(Japanese-Roleplay)
该语料库采集自日本角色扮演论坛(俗称「なりきりチャット(narikiri chat)」),每条记录对应一个独立讨论串。
已对该语料库应用以下筛选与清洗规则:
- 针对每条记录的「posts(帖子列表)」中的所有`post_content`(帖子正文),移除其中的回复锚点。
- 针对每条记录「posts」中的所有`post_content`,删除正文长度不超过10个字符的帖子。
- 若每条记录的「posts」中唯一`poster`(发言者)的数量小于等于1,则删除该整条记录。
- 对每条记录的「posts」,若同一`poster`(发言者)连续发布多条帖子,则以换行符拼接其所有`post_content`,并转换为新的数据条目。
- 经上述处理后,若「posts」中唯一`post_content`的数量小于等于10,则删除该整条记录。
- 若首条发言者(first_poster)未出现在后续帖子的发言者列表中,则删除该整条记录。
并非所有对话均为纯粹的角色扮演内容,部分记录包含针对设定的初始讨论,或延续自其他讨论串。
提供机构:
maas
创建时间:
2025-07-07



