afrizalha/Tumpeng-1-Indonesian

Name: afrizalha/Tumpeng-1-Indonesian
Creator: afrizalha
Published: 2024-06-08 06:17:29
License: 暂无描述

Hugging Face2024-06-08 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/afrizalha/Tumpeng-1-Indonesian

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - text-generation language: - id size_categories: - 10K<n<100K --- # Synthetic Indonesian dataset with Llama 3 70B Tumpeng contains 14.8M words of 48.6K input-output pairs of Indonesian question-answering. It is intended to fine-tune Llama 3 8B, which has limited Indonesian language capabilities, to properly respond in Indonesian. It is a research preview dataset and not curated for factual accuracy or safety. Use this dataset at your discretion. # Out of scope use - Commercial use - Fine-tuning non-Llama 3 models # Supported tasks - General QA across a variety of domains - Contextual QA (copy-paste a text and ask a question about the contents in the text) - Multi-turn conversation (alpha; optimized for personal advice) - Writing (outlines and full-articles) # Conversational format This dataset contains experimental multi-turn conversation, with <|user|> and <|assistant|> as the user and LLM headers respectively. For proper formatting, please use the following template: ``` <|user|> {promt} <|assistant|> {response} ``` Alternatively, you can modify the strings in the dataset according to your intended format.

提供机构：

afrizalha

原始信息汇总

数据集概述

基本信息

名称: Synthetic Indonesian dataset with Llama 3 70B
语言: 印尼语（id）
许可: CC-BY-NC-4.0
大小: 10K<n<100K

数据内容

包含: 14.8M 单词，48.6K 输入-输出对
用途: 用于微调 Llama 3 8B 模型，以提升其在印尼语环境下的响应能力

数据特点

类型: 研究预览数据集，未经过事实准确性或安全性筛选
使用限制: 非商业用途，仅限微调 Llama 3 模型

支持的任务

通用领域问答
上下文问答
多轮对话（优化于个人建议）
写作（大纲和完整文章）

格式

对话格式: 包含实验性多轮对话，使用 <|user|> 和 <|assistant|> 分别作为用户和语言模型的标识
模板:

<|user|> {promt}

<|assistant|> {response}

5,000+

优质数据集

54 个

任务类型

进入经典数据集