lemonilia/Roleplay-Forums_2023-04
收藏Hugging Face2025-01-06 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/lemonilia/Roleplay-Forums_2023-04
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Roleplay Forums 2023-04,包含了2023年4月从多个大型或流行的英语角色扮演论坛抓取的数据,总计约47GB(未压缩,包含HTML标签和元数据)。数据集的主要目的是为角色扮演模型的微调提供有用的数据。数据集中的大部分数据以原始HTML格式存储,尚未准备好用于LLM训练或微调,需要进一步处理,例如处理HTML标签和/或BB代码,以及清理个人数据(如用户名、OOC等)。数据集的总行数为35,212,830。
The dataset is named Roleplay Forums 2023-04 and contains data scraped from several of the largest or most popular English-language roleplaying forums around April 2023, totaling approximately 47 GB (uncompressed, including HTML tags and metadata). The primary purpose of the dataset is to provide useful data for fine-tuning roleplay models. Most of the data in the dataset is stored in raw HTML format and is not ready for LLM training or fine-tuning in its current form. Further processing is required, such as handling HTML tags and/or BB code, and cleaning personal data (e.g., usernames, OOC, etc.). The total number of rows in the dataset is 35,212,830.
提供机构:
lemonilia



