Han Instruct Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10935821
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Card for Han Instruct Dataset v4.0 🪿🪿🪿🪿
🪿 Han (ห่าน or goose) Instruct Dataset is a Thai instruction dataset by PyThaiNLP. This dataset collects all Thai instruct datasets that were made by humans and our old model. The dataset can be used to train Instruction Following models like ChatGPT or others.
Data sources:
Reference desk at Thai wikipedia.
Law from justicechannel.org
pythainlp/final_training_set_v1_enth: Human checked and edited.
Self-instruct from WangChanGLM
Wannaphong.com
Blognone
Synthetic dataset from LLM
Human annotators
Supported Tasks and Leaderboards
ChatBot
Instruction Following
Languages
Thai
Dataset Structure
Data Fields
messages: ChatML
Considerations for Using the Data
The dataset can be biased by human annotators and LLM annotators. We recommend you check the dataset to select or remove an instruction before training the model or using it to at your risk.
Licensing Information
CC-BY-SA 4.0
创建时间:
2024-07-31



