xinqiyang/iruca_llama2_japanese_demo

Name: xinqiyang/iruca_llama2_japanese_demo
Creator: xinqiyang
Published: 2023-10-12 06:47:15
License: 暂无描述

Hugging Face2023-10-12 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/xinqiyang/iruca_llama2_japanese_demo

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: text dtype: string splits: - name: train num_bytes: 24485.34975369458 num_examples: 15 download_size: 3242 dataset_size: 24485.34975369458 configs: - config_name: default data_files: - split: train path: data/train-* --- # iruca-1k: Lazy Llama 2 Formatting This is a subset (1000 samples) of the excellent [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco) dataset, processed to match Llama 2's prompt format as described [in this article](https://huggingface.co/blog/llama2#how-to-prompt-llama-2). It was created using the following [colab notebook](https://colab.research.google.com/drive/1Ad7a9zMmkxuXTOh1Z7-rNSICA4dybpM2?usp=sharing). Useful if you don't want to reformat it by yourself (e.g., using a script). It was designed for [this article](https://mlabonne.github.io/blog/posts/Fine_Tune_Your_Own_Llama_2_Model_in_a_Colab_Notebook.html) about fine-tuning a Llama 2 (chat) model in a Google Colab. ### Format from xlsx file to CSV ```bash pip install openpyxl pandas python generate.py pip install huggingface_hub huggingface-cli repo create iruca_llama2_japanese_demo --type dataset git clone https://huggingface.co/datasets/xinqiyang/iruca_llama2_japanese_demo ```

提供机构：

xinqiyang

原始信息汇总

数据集概述

数据集信息

特征:
- 名称: text
- 数据类型: string
分割:
- 名称: train
- 字节数: 24485.34975369458
- 样本数: 15
下载大小: 3242
数据集大小: 24485.34975369458

配置

配置名称: default
数据文件:
- 分割: train
- 路径: data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集