robinsmits/ChatAlpaca-20K
收藏数据集卡片:ChatAlpaca 20K
数据集描述
ChatAlpaca 是一个聊天数据集,旨在帮助研究人员开发用于多轮对话中指令跟随的模型。该数据集是基于 Stanford Alpaca 数据 的扩展,包含多轮指令及其相应的回复。
ChatAlpaca 由中国科学院软件研究所的中国信息处理实验室开发。
在该数据集中,我们使用 ChatGPT(GPT-3.5-turbo)生成后续话语并继续与 ChatGPT 进行对话。这个过程产生了多轮对话,其中模拟用户提供指令,ChatGPT 相应地回应。
数据集结构
该数据集包含一个 JSON 文件 chatalpaca-20k.json,其中包含多轮指令跟随对话。每个条目代表模拟用户和 ChatGPT 之间的不同对话。数据集目前总共包含 20,000 个对话。
数据集分为三个部分:
original:原始chatalpaca-20k.json中的所有数据train:19600 行用于训练test:400 行用于测试或验证
数据集中的对话以一个指令开始,随后是 ChatGPT 的回复,这些回复来自 Stanford Alpaca 数据。然后对话继续由用户提供进一步的指令,ChatGPT 相应地回应。
每个条目是一个字典列表,每个字典包含以下字段:
id:整数,标识符messages:列表,包含多个字典表示对话
messages 列表中的每个字典包含以下字段:
role:字符串,值为 "user" 或 "assistant"content:字符串,对话内容
示例对话
json [ { "role": "user", "content": "Identify the odd one out: Twitter, Instagram, Telegram" }, { "role": "assistant", "content": "Telegram" }, { "role": "user", "content": "What makes Telegram different from Twitter and Instagram?" }, { "role": "assistant", "content": "Telegram is a cloud-based instant messaging app that focuses on privacy and security. Unlike Twitter and Instagram which are mainly used for following news, celebrities, and sharing images, Telegram was created as a secure messaging app for private and group communication. Telegram also offers more advanced features than Twitter and Instagram, such as the ability to schedule messages, create bots, and send encrypted messages." }, { "role": "user", "content": "Can you give me an example of how the scheduling messages feature can be useful on Telegram?" }, { "role": "assistant", "content": "Sure! The scheduling messages feature on Telegram can be useful in many different situations. Heres an example:
Lets say you have a workgroup on Telegram, and you want to send a message to everyone in the group to remind them of a meeting. However, you know that most of the people in the group are busy at work during the time you want to send the message.
Thats where the scheduling messages feature comes in. You can schedule the message to be sent at a later time when people are more likely to be free, such as during their lunch break or after work. This way, everyone in the group will receive the message at the same time, and you wont have to worry about anyone missing the important reminder.
Overall, the scheduling messages feature can be really helpful for coordinating communication with groups of people on Telegram, whether its for work, social events, or anything else." }, { "role": "user", "content": "Goodbye." } ]
引用
请在使用数据集时引用原始仓库。
@misc{ChatAlpaca, author = {Ning Bian and Hongyu Lin and Yaojie Lu and Xianpei Han and Le Sun and Ben He }, title = {ChatAlpaca: A Multi-Turn Dialogue Corpus based on Alpaca Instructions}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {url{https://github.com/cascip/ChatAlpaca}}, }



