BintangFortuna/Reddit-Writing-SGPT
收藏Hugging Face2024-12-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/BintangFortuna/Reddit-Writing-SGPT
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含对话和标签信息,特征包括响应词数、标签和对话列表(包含发言者和内容)。数据集经过Lilac工具过滤和分类,使用了jinaai/jina-embeddings-v2-base-en和BAAI/bge-m3两个嵌入模型。数据集已转换为ShareGPT格式,并移除了非故事内容,但可能仍存在一些遗漏。数据集可能包含Reddit Writing Prompts的混合版本,标签可能不完全准确,模糊案例已单独标记。
The dataset contains conversation and label information, with features including response words, labels, and a list of conversations (containing from and value fields). The dataset was filtered and classified using the Lilac tool, with two embedding models: jinaai/jina-embeddings-v2-base-en and BAAI/bge-m3. The dataset has been converted to ShareGPT format, and non-story content has been removed, though some examples may have been missed. The dataset may be a mixed version of Reddit Writing Prompts, and the labeling may not be 100% accurate, with ambiguous cases labeled separately.
提供机构:
BintangFortuna



