hon9kon9ize/yue-alpaca

Name: hon9kon9ize/yue-alpaca
Creator: hon9kon9ize
Published: 2024-04-20 20:23:17
License: 暂无描述

Hugging Face2024-04-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/hon9kon9ize/yue-alpaca

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: yue license: cc-by-nc-4.0 size_categories: - 1K<n<10K tags: - sft - alpaca dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 6906174 num_examples: 18649 download_size: 4606648 dataset_size: 6906174 configs: - config_name: default data_files: - split: train path: data/train-* --- # 廣東話草泥馬 ## Dataset Card for Cantonese Alpaca ![Cantonese Alpaca](https://github.com/hon9kon9ize/hon9kon9ize.github.io/blob/main/public/images/alpaca_with_tank.jpg?raw=true) - repository: (https://github.com/hon9kon9ize/yue-alpaca) ## Dataset Description This dataset contains Cantonese Instruction-Following generated by Gemini Pro using [Stanford's Alpaca](https://github.com/tatsu-lab/stanford_alpaca) prompts for fine-tuning LLMs. Attention: This dataset is generated by Gemini Pro and has not undergone rigorous verification. The content may contain errors. Please keep this in mind when using it. ## Licensing Information The dataset is available under the [Creative Commons NonCommercial (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/legalcode). ## Citation Information ``` @misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, } ```

提供机构：

hon9kon9ize

原始信息汇总

广东话草泥马数据集

数据集描述

该数据集包含由Gemini Pro使用Stanfords Alpaca生成的广东话指令跟随数据，用于微调大型语言模型（LLMs）。

注意：此数据集由Gemini Pro生成，未经严格验证，内容可能包含错误。请在使用时注意这一点。

数据集信息

特征

instruction: 数据类型为字符串
input: 数据类型为字符串
output: 数据类型为字符串

分割

train:
- 字节数: 6906174
- 样本数: 18649

大小

下载大小: 4606648
数据集大小: 6906174

配置

default:
- 数据文件:
  - 分割: train
  - 路径: data/train-*

许可信息

该数据集在Creative Commons NonCommercial (CC BY-NC 4.0)许可下提供。

引用信息

@misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {url{https://github.com/tatsu-lab/stanford_alpaca}}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集