five

shisa-v2-sharegpt

收藏
魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/shisa-ai/shisa-v2-sharegpt
下载链接
链接失效反馈
官方服务:
资源简介:
# shisa-v2-sharegpt This is an updated version of the original shisa-v1 dataset [augmxnt/ultra-orca-boros-en-ja-v1](https://huggingface.co/datasets/augmxnt/ultra-orca-boros-en-ja-v1) and retains the same `conversations` field and sharegpt formatting to facilitate its use as drop-in replacement for the original dataset. The shisa-v2 revision filters a few entries, but largely retains the exact composition and prompts of the original. - All responses have been entirely regenerated from open weight models ([Athene V2](https://huggingface.co/Nexusflow/Athene-V2-Chat), [Llama 3.3 70B](meta-llama/Llama-3.3-70B-Instruct), and [Tulu 3 405B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B)) - Outputs from the models were rated via [UltraFeedback](https://github.com/OpenBMB/UltraFeedback)-inspired scoring and treated as a best-of-n selection - While non-standard system prompts were retained, default system prompts (which should be usually be masked in training anyway) were removed for improved dataset efficiency and general cleanup

# shisa-v2-sharegpt 本数据集为原始shisa-v1数据集[augmxnt/ultra-orca-boros-en-ja-v1](https://huggingface.co/datasets/augmxnt/ultra-orca-boros-en-ja-v1)的更新版本,保留了与原版一致的`conversations`字段与ShareGPT格式,可作为原版数据集的即插即用替代方案,便于直接复用。 shisa-v2版本对少量条目进行了筛选,但整体保留了原始数据集的核心构成与提示词内容。 - 所有回复均由开源权重模型重新生成,包括[Athene V2](https://huggingface.co/Nexusflow/Athene-V2-Chat)、[Llama 3.3 70B](meta-llama/Llama-3.3-70B-Instruct)以及[Tulu 3 405B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-405B) - 模型输出通过基于UltraFeedback的评分机制进行筛选,并采用最佳n选(best-of-n)的方式选取最终结果 - 尽管保留了非标准化的系统提示词,但为提升数据集训练效率并完成整体清理,移除了通常在训练中需进行掩码的默认系统提示词。
提供机构:
maas
创建时间:
2025-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作