five

Highgroundbkk/Pangpuriye-generated_by_typhoon

收藏
Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Highgroundbkk/Pangpuriye-generated_by_typhoon
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-2.0 task_categories: - table-question-answering language: - th - en tags: - code pretty_name: Thai-SQL_Question_generated_by_Typhoon size_categories: - 1K<n<10K configs: - config_name: default data_files: - split: train path: "data.json" --- # 🤖 [Super AI Engineer Development Program Season 4](https://superai.aiat.or.th/) - Pangpuriye House - Generated by Typhoon API ![logo](https://huggingface.co/datasets/AIAT/Pangpuriye-generated_by_typhoon/resolve/main/logo/logo.png) **Pangpuriye's House Dataset - Generated Dataset from [Typhoon API](https://opentyphoon.ai/)** This dataset is an output generated from the Typhoon API in the structure of SQL instruction for fine-tuning [Pangpuriye's LLM](https://huggingface.co/AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat). The dataset is set under cc-by-nc-2.0 license. ## Content The dataset consists of 16,125 rows of `input`, `instruction`, and `output` packed into a train set. - Each schema has its own CSV file as an `input`. - The `instruction` is the command that the Typhoon API receives as input. - The `output` is a SQL code. ## Uses The dataset is intended to be used as an instruction for fine-tuning table-based QA LLM. The instruction requires some processing before it can be utilized in the process. The following code is an example for using with the schema. ```python stream = client.chat.completions.create( model="typhoon-instruct", messages=[ { "role": "instruction", "content": """ Your task is to generate SQL plain-text schema Format: You won't explain or clarify your response. """, }, {"role": "user", "content": """Generate 1 random schema"""}, ], max_tokens=120, temperature=0.6, top_p=1, stream=False, ) ``` ## Call our dataset by `datasets` library The following code is an example of calling our dataset via the `datasets` library. ```python from datasets import load_dataset dataset = load_dataset("AIAT/Pangpuriye-generated_by_typhoon") ``` ## Acknowledgements The dataset is collectively stored by the members of Panguriye's house during the LLMs hackathon in Super AI Engineer Development Program Season 4. We thank the organizers of this hackathon, [OpenThaiGPT](https://openthaigpt.aieat.or.th/), [AIAT](https://aiat.or.th/), [NECTEC](https://www.nectec.or.th/en/) and [ThaiSC](https://thaisc.io/) for this challenging task and opportunity to be a part of developing Thai large language model. ## Citation Information ``` @misc{pipatanakul2023typhoon, title={Typhoon: Thai Large Language Models}, author={Kunat Pipatanakul and Phatrasek Jirabovonvisut and Potsawee Manakul and Sittipong Sripaisarnmongkol and Ruangsak Patomwong and Pathomporn Chokchainant and Kasima Tharnpipitchai}, year={2023}, eprint={2312.13951}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
Highgroundbkk
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作