openthaigpt/lexitron2_prompt_finetune

Name: openthaigpt/lexitron2_prompt_finetune
Creator: openthaigpt
Published: 2024-11-16 14:22:01
License: 暂无描述

Hugging Face2024-11-16 更新2025-09-13 收录

下载链接：

https://hf-mirror.com/datasets/openthaigpt/lexitron2_prompt_finetune

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - question-answering language: - th - en tags: - art pretty_name: Lexitron 2.0 Prompt Finetuning Dataset size_categories: - 100K<n<1M maintainer: Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th) --- # Lexitron 2.0 Prompt Finetuning Dataset This dataset is derived from Lexitron 2.0, a Thai-English dictionary developed by NECTEC. It has been processed and formatted for prompt finetuning tasks. The original dataset is from: https://opend-portal.nectec.or.th/dataset/lexitron-2-0 ## Maintainer Kobkrit Viriyayudhakorn (kobkrit@iapp.co.th) ## Dataset Description The dataset consists of two main files: 1. `lexitron2_telex_finetune.qwen2.txt` - Thai to English lexicon entries in Qwen2 conversation format 2. `lexitron2_telex_finetune.jsonl` - Thai to English lexicon entries in JSONL format Each file contains dictionary entries formatted in their respective formats. ## Syntax and Example Dataset ### Syntax The dataset files follow specific formats for ease of use in prompt finetuning tasks. #### `lexitron2_telex_finetune.jsonl` Each entry in this file follows the generic prompt format: ```json { "instruction": "<instruction in English or Thai>", "output": "<translation or definition in Thai>" } ``` - **instruction**: A string that provides the translation task or query. It can be in English or Thai. - **output**: A string that provides the translation or definition in Thai. #### `lexitron2_telex_finetune.qwen2.txt` Each entry in this file follows the Qwen2 conversation format: ```text <|im_start|>system\n คุณคือผู้ช่วยตอบคำถามที่ฉลาดและซื่อสัตย์<|im_end|>\n <|im_start|>user\n คำว่า 'example_word' แปลว่าอะไร<|im_end|>\n <|im_start|>assistant\n คำว่า 'example_word' เป็นคำประเภท Noun (คำนาม) แปลเป็นภาษาอังกฤษว่า example\n ความหมาย: ตัวอย่างความหมาย\n ตัวอย่างประโยค: ตัวอย่างการใช้ในประโยค<|im_end|> ``` The Qwen2 format consists of three main components: - **system**: Defines the AI assistant's role and characteristics - **user**: Contains the translation query or question (in Thai or English) - **assistant**: Provides the structured response including: - Word type/category - English translation - Definition - Example sentences (when available) - Synonyms (when available) - Antonyms (when available) - Related words (when available) - Classifiers (when available) ## Example Dataset Here is an example of how the dataset is structured: ```json {"instruction": "Please translate 'a' into Thai", "output": "คำว่า 'a' เป็นคำประเภท Determiner แปลเป็นภาษาไทยว่า 'หนึ่ง (คำนำหน้าคำนามเพื่อแสดงว่าคำนามนั้นๆ ไม่ชี้เฉพาะ)'"} ``` ```text <|im_start|>system\nคุณคือผู้ช่วยตอบคำถามที่ฉลาดและซื่อสัตย์<|im_end|>\n<|im_start|>user\nคำว่า 'ดังกล่าวข้างต้น' สามารถใช้ในประโยคอย่างไรได้บ้าง<|im_end|>\n<|im_start|>assistant\nคำว่า 'ดังกล่าวข้างต้น' เป็นคำประเภท Pronoun (คำสรรพนาม) แปลเป็นภาษาอังกฤษว่า abovementioned\nคำที่มีความหมายเหมือนกัน: ดังกล่าว\nตัวอย่างประโยค: หน่วยงานของเราสามารถรับบทบาทได้เป็นอย่างดี ตามสภาพความพร้อมด้านต่างๆ ดังกล่าวข้างต้น<|im_end|> ```

提供机构：

openthaigpt

5,000+

优质数据集

54 个

任务类型

进入经典数据集