marcuscedricridia/UImerge-ShareGPT-deepclean-sharegpt
收藏Hugging Face2025-04-02 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/marcuscedricridia/UImerge-ShareGPT-deepclean-sharegpt
下载链接
链接失效反馈官方服务:
资源简介:
Marcuscedricridia/UImerge-ShareGPT数据集是一个单列格式数据集,包含对话信息。输入和输出角色均为人类或GPT。数据集经过基本的清洗步骤,包括输入解析、去重、长度过滤、语言过滤、对齐过滤、模板移除和近似重复移除,最终数据集大小为4312条记录。数据集在清洗过程中使用了特定的参数,如最小对话轮数、最小人类和助手长度、语言过滤、对齐过滤等。
The Marcuscedricridia/UImerge-ShareGPT dataset is a single-column format dataset containing conversation information. Both input and output roles are human or GPT. The dataset has undergone basic cleaning steps including input parsing, deduplication, length filtering, language filtering, alignment filtering, boilerplate removal, and near-duplicate removal, resulting in a final dataset size of 4312 records. Specific parameters such as minimum number of turns, minimum human and assistant lengths, language filtering, and alignment filtering were used during the cleaning process.
提供机构:
marcuscedricridia



