sh0416/wildchat-1m-tagged
收藏Hugging Face2024-11-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/sh0416/wildchat-1m-tagged
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是对WildChat-1M数据集的扩展,包含了由Mistral-7B-Instruct-v0.3模型标注的三个额外类别:`category`、`complexity`和`length`。数据集旨在复制自教评估器(self-taught evaluator)论文中引入的类别注释过程。由于计算限制,数据集在模型选择和注释过程中存在一些差异。例如,使用了Mistral-7B-Instruct-v0.3模型进行类别标注,而不是Mixtral 22Bx8 Instruct模型。此外,数据集的注释过程还包括从会话中提取最后一条用户消息作为指令,并使用贪婪解码和VLLM进行高吞吐量生成。数据集的统计信息显示了类别、复杂度和长度特征的分布情况。
The WildChat 1M dataset is a dataset for text generation, question answering, and text-to-text generation tasks. The dataset includes features such as conversation details, moderation scores, and annotations for categories, complexity, and length. The dataset is annotated using the Mistral-7B-Instruct-v0.3 model, following the methodology described in a related paper. The README also includes statistics about the distribution of categories, complexity, and length features in the dataset.
提供机构:
sh0416



