hzb29/Zhoulifeng-QA-SFT-Dataset
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/hzb29/Zhoulifeng-QA-SFT-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
Zhoulifeng-QA-SFT-Dataset是一个经过精心构造的中文对话SFT(监督微调)数据集,内容源自知名户外主播峰哥亡命天涯的直播语音转录文本。数据集利用DeepSeek (V3)对原始语音转录文本进行了深度处理,在清洗噪声的同时,最大程度地保留了峰哥本人的语言风格、幽默感、犀利的社会洞察以及独特的口癖。数据集旨在帮助开发者训练具有鲜明“峰哥”人格、能够进行高质量闲聊、观点输出和幽默互动的AI模型。数据来源为2023-2024年峰哥亡命天涯直播录音,经过Whisper转录和文本切片处理,数据格式为标准Instruction/Output格式,可直接用于SFT训练。
The Zhoulifeng-QA-SFT-Dataset is a meticulously constructed Chinese dialogue SFT (Supervised Fine-Tuning) dataset. The content originates from the transcribed texts of live broadcasts by the renowned outdoor streamer 峰哥亡命天涯 (Fengge Wangming Tianya). The dataset utilizes DeepSeek (V3) to deeply process the original ASR (Automatic Speech Recognition) texts: while cleaning noise, it maximally retains Fengges unique language style, humor, sharp social insights, and distinctive speech habits. This dataset aims to assist developers in training AI models with a distinct Fengge personality, capable of high-quality casual conversations, opinion expression, and humorous interactions. The data source is Fengges live broadcast recordings from 2023-2024, transcribed by Whisper and sliced into text segments. The data format follows the standard Instruction/Output format, ready for direct use in SFT training.
提供机构:
hzb29
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



