five

welyjesch/alpaca_kapampangan

收藏
Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/welyjesch/alpaca_kapampangan
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - pam - en license: cc-by-nc-4.0 task_categories: - text-generation - question-answering tags: - alpaca - instruction-tuning - kapampangan - pampangan - philippine-languages - low-resource - translation pretty_name: Kapampangan Alpaca Dataset size_categories: - 10K<n<100K source_datasets: - tatsu-lab/alpaca --- # 🇵🇭 Kapampangan Alpaca Dataset ## Dataset Description - **Point of Contact:** welyjesch@gmail.com - **Primary Language:** Kapampangan - **Source Language:** English ### Dataset Summary This dataset is a **Kapampangan translation** of the original Alpaca instruction-following dataset. It is designed to support research and development of **instruction-tuned language models** for low-resource Philippine languages, particularly Kapampangan. The dataset retains the original Alpaca structure while providing high-quality translations of instructions, inputs, and outputs. ## Dataset Structure ### Data Instances Each example follows this JSON format: ```json { "instruction": "Kapampangan instruction text", "input": "Optional context in Kapampangan", "output": "Expected response in Kapampangan" } ``` ### Data Fields - `instruction`: The task or question in Kapampangan. - `input`: Additional context (may be empty). - `output`: The correct expected response in Kapampangan. ### Data Splits | Split | Description | |------------|-------------------------------------------| | `train` | Main dataset for training | | `validation` | Optional validation set (if provided) | ## Dataset Creation ### Source Data Based on the original **Alpaca dataset**, which was generated using instruction-following data derived from OpenAI models. ### Translation Process Translated from English to Kapampangan using: - Machine translation + human post-editing *(or specify your actual method)* - Native speaker validation *(if applicable)* ## Use Cases This dataset can be used for: - Instruction tuning of LLMs in Kapampangan - Multilingual NLP research - Low-resource language modeling - Chatbot and assistant development for Kapampangan speakers ## Limitations - May contain translation artifacts or unnatural phrasing. - Cultural nuances might not always be preserved. - Not all instructions may perfectly align with Kapampangan linguistic norms. - Quality depends on the exact translation method used. ## Ethical Considerations Ensure responsible use when deploying models trained on this dataset. Be mindful of: - Bias inherited from the original Alpaca dataset. - Potential mistranslations or harmful outputs. - Not intended for high-stakes applications without further validation. ## Licensing The original Alpaca dataset license applies. **License:** CC BY-NC 4.0 *(Note: Datasets generated from OpenAI models are generally restricted from commercial use competing with OpenAI).* ## Citation If you use this dataset, please cite: ```bibtex @dataset{kapampangan_alpaca, title = {Kapampangan Alpaca Dataset}, author = {Wely Jesch Sabalilag}, year = {2026}, note = {Translated version of the Alpaca dataset} } ``` ## Acknowledgements - Original[Alpaca dataset creators (Stanford CRFM)](https://crfm.stanford.edu/2023/03/13/alpaca.html). - Contributors and translators for Kapampangan. ## Contact For questions or contributions: - **Name:** Wely Jesch Sabalilag - **Email:**[welyjesch@gmail.com](mailto:welyjesch@gmail.com) - **GitHub:**[github.com/welyjesch](https://github.com/welyjesch)
提供机构:
welyjesch
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作