pythainlp/han-instruct-dataset-v3.0
收藏Hugging Face2024-07-05 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/pythainlp/han-instruct-dataset-v3.0
下载链接
链接失效反馈官方服务:
资源简介:
Han Instruct Dataset v3.0是一个泰语指令数据集,由PyThaiNLP创建,包含由人类和旧模型生成的泰语指令数据。该数据集可用于训练指令跟随模型,如ChatGPT。数据来源包括泰国维基百科的参考台、justicechannel.org的法律信息、pythainlp/final_training_set_v1_enth、WangChanGLM的自我指令、Wannaphong.com、Blognone以及人类注释者。数据集支持的任务包括聊天机器人和指令跟随。数据集结构包含messages字段,采用ChatML格式。使用数据时需注意可能存在的人类注释者偏见。数据集采用CC-BY-SA 4.0许可证。
The Han (ห่าน or goose) Instruct Dataset is a Thai instruction dataset collected by PyThaiNLP. This dataset aggregates all Thai instruct datasets made by humans and our old models, which can be used to train Instruction Following models like ChatGPT. Many questions are collected from the Reference desk at Thai Wikipedia. Data sources include the Reference desk at Thai Wikipedia, legal websites, human-checked and edited datasets, self-instruct models, personal websites, and blogs. The dataset supports tasks such as ChatBot and Instruction Following. The language of the dataset is Thai, and caution should be exercised regarding potential human biases. The licensing information is CC-BY-SA 4.0.
提供机构:
pythainlp



