five

RuizheChen/Penguin

收藏
Hugging Face2025-02-20 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/RuizheChen/Penguin
下载链接
链接失效反馈
官方服务:
资源简介:
LMSYS-Chat-1M是一个包含一百万个真实世界对话的大型数据集,这些对话涉及25种最先进的语言模型。该数据集从2023年4月至8月在[Vicuna demo and Chatbot Arena网站](https://chat.lmsys.org/)上的21万个独立IP地址中收集。每个样本包括一个对话ID、模型名称、以OpenAI API JSON格式存储的对话文本、检测到的语言标签和OpenAI内容审查API标签。数据集已经尽可能地移除了包含个人身份信息的对话,并包含了每条消息的内容审查API输出,但保留了不安全的对话,以便研究者可以研究LLM在实际场景中的安全问题以及OpenAI的内容审查过程。该数据集未经过脱敏处理,可能包含来自流行基准测试的测试问题。

LMSYS-Chat-1M is a large-scale dataset containing one million real-world conversations involving 25 state-of-the-art language models. The dataset was collected from 210K unique IP addresses on the [Vicuna demo and Chatbot Arena website](https://chat.lmsys.org/) from April to August 2023. Each sample includes a conversation ID, model name, conversation text in OpenAI API JSON format, detected language tag, and OpenAI moderation API tag. The dataset has been made as free from personally identifiable information (PII) as possible, and includes the moderation API output for each message, while keeping unsafe conversations for researchers to study safety issues associated with LLM usage in real-world scenarios and the OpenAI moderation process. The dataset has not undergone decontamination and may contain test questions from popular benchmarks.
提供机构:
RuizheChen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作