five

Han Instruct Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10935821
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Card for Han Instruct Dataset v4.0 🪿🪿🪿🪿 🪿 Han (ห่าน or goose) Instruct Dataset is a Thai instruction dataset by PyThaiNLP. This dataset collects all Thai instruct datasets that were made by humans and our old model. The dataset can be used to train Instruction Following models like ChatGPT or others. Data sources: Reference desk at Thai wikipedia. Law from justicechannel.org pythainlp/final_training_set_v1_enth: Human checked and edited. Self-instruct from WangChanGLM Wannaphong.com Blognone Synthetic dataset from LLM Human annotators Supported Tasks and Leaderboards ChatBot Instruction Following Languages Thai Dataset Structure Data Fields messages: ChatML Considerations for Using the Data The dataset can be biased by human annotators and LLM annotators. We recommend you check the dataset to select or remove an instruction before training the model or using it to at your risk. Licensing Information CC-BY-SA 4.0
创建时间:
2024-07-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作