NCIFD 民族文化微调数据集

超神经2025-01-17 更新2025-01-18 收录

下载链接：

https://hyper.ai/cn/datasets/37206

下载链接

链接失效反馈

官方服务：

资源简介：

NCIFD (National Culture Instruction-Following Dataset) 是由中央民族大学国家语言资源监测与研究少数民族语言中心构造的一个面向大模型的民族文化微调数据集，包含 151,159 条数据，其中公开 10,000 条数据，涵盖建筑、服饰、工艺、饮食、礼仪、语言、习俗 7 大领域的内容。

NCIFD (National Culture Instruction-Following Dataset) is an ethnic culture fine-tuning dataset tailored for large language models (LLMs), constructed by the Ethnic Minority Language Research Center of National Language Resources Monitoring and Research, Minzu University of China. It contains 151,159 data entries in total, among which 10,000 entries are publicly accessible. The dataset covers seven major domains including architecture, clothing, traditional crafts, diet, etiquette, language and folk customs.

创建时间：

2025-01-14

搜集汇总

数据集介绍

背景与挑战

背景概述

NCIFD 民族文化微调数据集是由中央民族大学国家语言资源监测与研究少数民族语言中心构造的，面向大模型的民族文化微调数据集，包含 151,159 条数据，其中公开 10,000 条数据，涵盖建筑、服饰、工艺、饮食、礼仪、语言、习俗 7 大领域。数据集由 NCSI（通过 Self-Instruct 框架生成数据）和 NCQA（通过 Self-QA 框架生成 QA 对）两部分组成，并经过质量筛查以确保清晰度、完整性和准确性，适用于自然语言处理和模型训练任务。

以上内容由遇见数据集搜集并总结生成