welyjesch/alpaca_kapampangan
收藏Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/welyjesch/alpaca_kapampangan
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- pam
- en
license: cc-by-nc-4.0
task_categories:
- text-generation
- question-answering
tags:
- alpaca
- instruction-tuning
- kapampangan
- pampangan
- philippine-languages
- low-resource
- translation
pretty_name: Kapampangan Alpaca Dataset
size_categories:
- 10K<n<100K
source_datasets:
- tatsu-lab/alpaca
---
# 🇵🇭 Kapampangan Alpaca Dataset
## Dataset Description
- **Point of Contact:** welyjesch@gmail.com
- **Primary Language:** Kapampangan
- **Source Language:** English
### Dataset Summary
This dataset is a **Kapampangan translation** of the original Alpaca instruction-following dataset. It is designed to support research and development of **instruction-tuned language models** for low-resource Philippine languages, particularly Kapampangan.
The dataset retains the original Alpaca structure while providing high-quality translations of instructions, inputs, and outputs.
## Dataset Structure
### Data Instances
Each example follows this JSON format:
```json
{
"instruction": "Kapampangan instruction text",
"input": "Optional context in Kapampangan",
"output": "Expected response in Kapampangan"
}
```
### Data Fields
- `instruction`: The task or question in Kapampangan.
- `input`: Additional context (may be empty).
- `output`: The correct expected response in Kapampangan.
### Data Splits
| Split | Description |
|------------|-------------------------------------------|
| `train` | Main dataset for training |
| `validation` | Optional validation set (if provided) |
## Dataset Creation
### Source Data
Based on the original **Alpaca dataset**, which was generated using instruction-following data derived from OpenAI models.
### Translation Process
Translated from English to Kapampangan using:
- Machine translation + human post-editing *(or specify your actual method)*
- Native speaker validation *(if applicable)*
## Use Cases
This dataset can be used for:
- Instruction tuning of LLMs in Kapampangan
- Multilingual NLP research
- Low-resource language modeling
- Chatbot and assistant development for Kapampangan speakers
## Limitations
- May contain translation artifacts or unnatural phrasing.
- Cultural nuances might not always be preserved.
- Not all instructions may perfectly align with Kapampangan linguistic norms.
- Quality depends on the exact translation method used.
## Ethical Considerations
Ensure responsible use when deploying models trained on this dataset. Be mindful of:
- Bias inherited from the original Alpaca dataset.
- Potential mistranslations or harmful outputs.
- Not intended for high-stakes applications without further validation.
## Licensing
The original Alpaca dataset license applies.
**License:** CC BY-NC 4.0 *(Note: Datasets generated from OpenAI models are generally restricted from commercial use competing with OpenAI).*
## Citation
If you use this dataset, please cite:
```bibtex
@dataset{kapampangan_alpaca,
title = {Kapampangan Alpaca Dataset},
author = {Wely Jesch Sabalilag},
year = {2026},
note = {Translated version of the Alpaca dataset}
}
```
## Acknowledgements
- Original[Alpaca dataset creators (Stanford CRFM)](https://crfm.stanford.edu/2023/03/13/alpaca.html).
- Contributors and translators for Kapampangan.
## Contact
For questions or contributions:
- **Name:** Wely Jesch Sabalilag
- **Email:**[welyjesch@gmail.com](mailto:welyjesch@gmail.com)
- **GitHub:**[github.com/welyjesch](https://github.com/welyjesch)
提供机构:
welyjesch



