five

hon9kon9ize/yue-alpaca

收藏
Hugging Face2024-04-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/hon9kon9ize/yue-alpaca
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: yue license: cc-by-nc-4.0 size_categories: - 1K<n<10K tags: - sft - alpaca dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 6906174 num_examples: 18649 download_size: 4606648 dataset_size: 6906174 configs: - config_name: default data_files: - split: train path: data/train-* --- # 廣東話草泥馬 ## Dataset Card for Cantonese Alpaca ![Cantonese Alpaca](https://github.com/hon9kon9ize/hon9kon9ize.github.io/blob/main/public/images/alpaca_with_tank.jpg?raw=true) - repository: (https://github.com/hon9kon9ize/yue-alpaca) ## Dataset Description This dataset contains Cantonese Instruction-Following generated by Gemini Pro using [Stanford's Alpaca](https://github.com/tatsu-lab/stanford_alpaca) prompts for fine-tuning LLMs. Attention: This dataset is generated by Gemini Pro and has not undergone rigorous verification. The content may contain errors. Please keep this in mind when using it. ## Licensing Information The dataset is available under the [Creative Commons NonCommercial (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/legalcode). ## Citation Information ``` @misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, } ```
提供机构:
hon9kon9ize
原始信息汇总

广东话草泥马数据集

数据集描述

该数据集包含由Gemini Pro使用Stanfords Alpaca生成的广东话指令跟随数据,用于微调大型语言模型(LLMs)。

注意:此数据集由Gemini Pro生成,未经严格验证,内容可能包含错误。请在使用时注意这一点。

数据集信息

特征

  • instruction: 数据类型为字符串
  • input: 数据类型为字符串
  • output: 数据类型为字符串

分割

  • train:
    • 字节数: 6906174
    • 样本数: 18649

大小

  • 下载大小: 4606648
  • 数据集大小: 6906174

配置

  • default:
    • 数据文件:
      • 分割: train
      • 路径: data/train-*

许可信息

该数据集在Creative Commons NonCommercial (CC BY-NC 4.0)许可下提供。

引用信息

@misc{alpaca, author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, title = {Stanford Alpaca: An Instruction-following LLaMA model}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {url{https://github.com/tatsu-lab/stanford_alpaca}}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作