manishiitg/aditi-syn-v2
收藏Hugging Face2024-03-26 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/manishiitg/aditi-syn-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- hi
license: apache-2.0
dataset_info:
features:
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: type
dtype: string
splits:
- name: train
num_bytes: 201536052
num_examples: 55450
download_size: 86746197
dataset_size: 201536052
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
This repo contains synthentic dataset in hindi/hinglish language using to fine tune aditi OOS LLM.
This has different type of data formats
1. TOOLS: teaching hindi/hinglish based function calling
2. RAG/RAG-Complex: teching context based rag for hindi/hinglish
3. CODE: writing code
4. ORCA: reasoning/math questions
5. COT: chain of thought reasoning
6. Prompts: Hindi/Hinglish answers on highly quality curated prompts
7. Instruct: general Q/A on indian context questions
8. Writing: general writing instructions
9. Instruct-Follow: generate system prompts to train model on instructions following
10. Roleplay: roleplaying on indian characters
The entire data generation pipeline can be seen at https://github.com/manishiitg/aditi_dataset
提供机构:
manishiitg
原始信息汇总
数据集概述
基本信息
- 语言: 印地语(Hindi)
- 许可证: Apache-2.0
数据集特征
- 特征名称: messages
- 子特征:
- 名称: content
- 数据类型: 字符串
- 名称: role
- 数据类型: 字符串
- 名称: content
- 子特征:
- 特征名称: type
- 数据类型: 字符串
数据集分割
- 分割名称: train
- 数据大小: 201,536,052字节
- 示例数量: 55,450
下载与数据集大小
- 下载大小: 86,746,197字节
- 数据集大小: 201,536,052字节
配置
- 配置名称: default
- 数据文件:
- 分割: train
- 路径: data/train-*
- 分割: train
- 数据文件:



