five

bhavya777/harper-valley-final

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/bhavya777/harper-valley-final
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: multiturn_sft data_files: - split: train path: multiturn_sft/train-* - config_name: dpo_style data_files: - split: train path: dpo_style/train-* - config_name: dpo_rude data_files: - split: train path: dpo_rude/train-* - config_name: dpo_merged data_files: - split: train path: dpo_merged/train-* --- # Harper Valley Final (SFT + DPO) This dataset contains four configurations: - `multiturn_sft` — multi-turn supervised fine-tuning conversations - `dpo_style` — stylistic preference pairs for alignment - `dpo_rude` — rude vs polite preference pairs - `dpo_merged` — combined preference dataset All configurations currently include a `train` split. --- ## Attribution This dataset is derived from the **Harper Valley Bank Dataset**. - **Original Authors / Organization:** Gridspace - **Source Repository:** https://github.com/cricketclub/gridspace-stanford-harper-valley - **Kaggle Version:** https://www.kaggle.com/datasets/mdalimranabir/harpervalleybank - **Paper:** https://arxiv.org/abs/2010.13929 - **License:** Creative Commons Attribution 4.0 International (CC BY 4.0) We acknowledge and thank the original creators for making this dataset publicly available for research and educational use. --- ## Modifications The dataset has been processed as follows: - Extracted transcripts from the raw dataset - Removed unnecessary metadata and noise - Reformatted data into JSON suitable for chat and preference training - Cleaned and filtered text for quality --- ## License This dataset follows the same license as the original: **CC BY 4.0** https://creativecommons.org/licenses/by/4.0/ Users are required to provide appropriate attribution to the original authors when using this dataset.
提供机构:
bhavya777
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作