bhavya777/harper-valley-final
收藏Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bhavya777/harper-valley-final
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: multiturn_sft
data_files:
- split: train
path: multiturn_sft/train-*
- config_name: dpo_style
data_files:
- split: train
path: dpo_style/train-*
- config_name: dpo_rude
data_files:
- split: train
path: dpo_rude/train-*
- config_name: dpo_merged
data_files:
- split: train
path: dpo_merged/train-*
---
# Harper Valley Final (SFT + DPO)
This dataset contains four configurations:
- `multiturn_sft` — multi-turn supervised fine-tuning conversations
- `dpo_style` — stylistic preference pairs for alignment
- `dpo_rude` — rude vs polite preference pairs
- `dpo_merged` — combined preference dataset
All configurations currently include a `train` split.
---
## Attribution
This dataset is derived from the **Harper Valley Bank Dataset**.
- **Original Authors / Organization:** Gridspace
- **Source Repository:** https://github.com/cricketclub/gridspace-stanford-harper-valley
- **Kaggle Version:** https://www.kaggle.com/datasets/mdalimranabir/harpervalleybank
- **Paper:** https://arxiv.org/abs/2010.13929
- **License:** Creative Commons Attribution 4.0 International (CC BY 4.0)
We acknowledge and thank the original creators for making this dataset publicly available for research and educational use.
---
## Modifications
The dataset has been processed as follows:
- Extracted transcripts from the raw dataset
- Removed unnecessary metadata and noise
- Reformatted data into JSON suitable for chat and preference training
- Cleaned and filtered text for quality
---
## License
This dataset follows the same license as the original:
**CC BY 4.0**
https://creativecommons.org/licenses/by/4.0/
Users are required to provide appropriate attribution to the original authors when using this dataset.
提供机构:
bhavya777



