2imi9/llama2_7B_data_10G

Name: 2imi9/llama2_7B_data_10G
Creator: 2imi9
Published: 2024-09-28 20:46:55
License: 暂无描述

Hugging Face2024-09-28 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/2imi9/llama2_7B_data_10G

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集由10GB的开源双语数据（中文和英文）组成，由齐子明整理，数据来源于Hugging Face和CSDN等平台。数据集涵盖了广泛的主题，特别强调多轮对话逻辑和推理，包含通用和技术性的问答对，非常适合训练需要处理多轮对话并保持上下文连贯的AI模型。数据集旨在提高AI模型在教育环境中的表现，特别是用于个性化辅导应用，帮助模型进行多轮对话，增强其逻辑性和一致性。数据集的结构包括Sharegpt和Alpaca两种格式，分别用于不同的对话和指令响应场景。数据集的创建过程涉及从开源平台收集数据，并经过精心筛选以确保与深圳大学“大学计算机”课程的教育目标一致。数据集存在潜在的偏见和风险，如数据源偏见、语言偏见、过度泛化、质量不一致等，并且其适用性在特定领域可能受限。

This dataset consists of 10GB of open-source bilingual data (Chinese and English) organized by Ziming Qi, sourced from platforms such as Hugging Face and CSDN. The data covers a wide range of topics, with an emphasis on multi-round conversational logic and reasoning. It includes both general and technical question-answer pairs, making it ideal for training AI models that need to handle extended conversations and maintain context across multiple exchanges. The dataset is designed to improve the performance of AI models in educational settings, particularly for personalized tutoring applications. The content helps models engage in multi-round dialogues, enhancing their ability to respond logically and consistently over time.

提供机构：

2imi9

5,000+

优质数据集

54 个

任务类型

进入经典数据集