five

BintangFortuna/Reddit-Writing-SGPT

收藏
Hugging Face2024-12-03 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/BintangFortuna/Reddit-Writing-SGPT
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含对话和标签信息,特征包括响应词数、标签和对话列表(包含发言者和内容)。数据集经过Lilac工具过滤和分类,使用了jinaai/jina-embeddings-v2-base-en和BAAI/bge-m3两个嵌入模型。数据集已转换为ShareGPT格式,并移除了非故事内容,但可能仍存在一些遗漏。数据集可能包含Reddit Writing Prompts的混合版本,标签可能不完全准确,模糊案例已单独标记。

The dataset contains conversation and label information, with features including response words, labels, and a list of conversations (containing from and value fields). The dataset was filtered and classified using the Lilac tool, with two embedding models: jinaai/jina-embeddings-v2-base-en and BAAI/bge-m3. The dataset has been converted to ShareGPT format, and non-story content has been removed, though some examples may have been missed. The dataset may be a mixed version of Reddit Writing Prompts, and the labeling may not be 100% accurate, with ambiguous cases labeled separately.
提供机构:
BintangFortuna
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作