five

reddgr/talking-to-chatbots-chats

收藏
Hugging Face2024-12-19 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/reddgr/talking-to-chatbots-chats
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是一个正在进行中的项目,包含了与各种LLM工具的对话,数据来源于网站[Talking to Chatbots](https://talkingtochatbots.com)的作者。数据集的格式类似于[lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)。每个对话通过UUID (v4)标识,并以JSON格式封装,其中每条消息包含在content键中。role键标识消息是用户提示(user)还是LLM的响应(assistant)。每个字典中的turn提供了对话中提示-响应对的编号序列,tag可能包含有关消息的简短注释。数据集还包括source(使用的LLM工具或服务)、model_family(如OpenAI-GPT、Google-Gemini、Anthropic-Claude等)、date(如果可用)、turns(对话中的总轮次)和conversation_tag(适用于整个对话的任何注释信息)。此外,还提到了一个解包版本的数据集[reddgr/talking-to-chatbots-unwrapped-chats](https://huggingface.co/datasets/reddgr/talking-to-chatbots-unwrapped-chats),其中每个对话轮次(提示-响应对)作为单独记录呈现,并使用reddgr系列模型计算了额外的指标和分类标签。

This work-in-progress dataset contains conversations with various LLM tools, sourced by the author of the website [Talking to Chatbots](https://talkingtochatbots.com). The format chosen for structuring this dataset is similar to that of [lmsys/lmsys-chat-1m](https://huggingface.co/datasets/lmsys/lmsys-chat-1m). Conversations are identified by a UUID (v4) and wrapped in a JSON format where each message is contained in the content key. The role key identifies whether the message is a prompt (user) or a response by the LLM (assistant). For each dictionary, turn provides a numbered sequence of prompt-response pairs in the conversation, and tag may include a brief annotation about the message. Additionally, the dataset includes the fields source (LLM tool or service used), model_family (i.e. OpenAI-GPT, Google-Gemini, Anthropic-Claude, etc.), date (when available), turns (total number of turns in the conversation), and conversation_tag (for any annotated info applying to the full conversation). The dataset [reddgr/talking-to-chatbots-unwrapped-chats](https://huggingface.co/datasets/reddgr/talking-to-chatbots-unwrapped-chats) is an unwrapped version of this dataset, where each turn (prompt-response pair) in the conversation is presented as an individual record, with additional metrics and classification labels calculated with the reddgr family of models.
提供机构:
reddgr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作