five

Japanese-Roleplay

收藏
魔搭社区2025-07-24 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/OmniAICreator/Japanese-Roleplay
下载链接
链接失效反馈
官方服务:
资源简介:
# Japanese-Roleplay This is a dialogue corpus collected from Japanese role-playing forum (commonly known as "なりきりチャット(narikiri chat)"). Each record corresponds to a single thread. The following filtering and cleaning conditions have been applied: - For all `post_content` in the `posts` of each record, remove response anchors. - For all `post_content` in the `posts` of each record, delete posts where the `post_content` length is 10 characters or less. - If the number of unique `poster` types in the `posts` of each record is 1 or less, delete the entire record. - For the `posts` in each record, if the same `poster` appears consecutively, concatenate their `post_content` with line breaks and convert it into new data. - If the number of unique `post_content` in the `posts` converted by the above processing is 10 or less, delete the entire record. - If the `first_poster` is not present among the `poster` in the subsequent posts, delete the entire record. Not all dialogues are purely role-playing. Some records include initial discussions about the settings, or they may continue from other threads.

# 日语角色扮演语料库(Japanese-Roleplay) 该语料库采集自日本角色扮演论坛(俗称「なりきりチャット(narikiri chat)」),每条记录对应一个独立讨论串。 已对该语料库应用以下筛选与清洗规则: - 针对每条记录的「posts(帖子列表)」中的所有`post_content`(帖子正文),移除其中的回复锚点。 - 针对每条记录「posts」中的所有`post_content`,删除正文长度不超过10个字符的帖子。 - 若每条记录的「posts」中唯一`poster`(发言者)的数量小于等于1,则删除该整条记录。 - 对每条记录的「posts」,若同一`poster`(发言者)连续发布多条帖子,则以换行符拼接其所有`post_content`,并转换为新的数据条目。 - 经上述处理后,若「posts」中唯一`post_content`的数量小于等于10,则删除该整条记录。 - 若首条发言者(first_poster)未出现在后续帖子的发言者列表中,则删除该整条记录。 并非所有对话均为纯粹的角色扮演内容,部分记录包含针对设定的初始讨论,或延续自其他讨论串。
提供机构:
maas
创建时间:
2025-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作