"Synth-Empathy"
收藏DataCite Commons2026-04-17 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/synth-empathy
下载链接
链接失效反馈官方服务:
资源简介:
"With the rapid advancement of large language models (LLMs), developing models capable of generating empathetic responses has become increasingly important. However, existing empathetic datasets are typically limited in scale and rely heavily on human annotation, resulting in substantial labor costs and insufficient data coverage.In this work, we propose Synth-Empathy, an LLM-based data generation pipeline that integrates quality filtering and diversity selection to automatically construct high-quality empathetic dialogue data. Leveraging Synth-Empathy, we construct Synth-Empathy-Dialogues dataset which contain 18,789 high-quality empathetic conversations. Utilizing Synth-Empathy-Dialogues dataset to train Qwen1.5-7B, our approach significantly enhances the empathetic response capability of LLMs and achieves strong performance on the EMPATHETICDIALOGUES (ED) benchmark and in human evaluations. Furthermore, our data selection strategy effectively removes over 80\\% of redundant synthetic samples, greatly improving generation efficiency.Comprehensive automatic and human evaluations demonstrate that models fine-tuned on Synth-Empathy data exhibit superior empathy, coherence, and linguistic diversity compared to both fine-tuned and larger-scale baseline models. These findings highlight the potential of synthetic empathetic data as a scalable alternative to costly human annotation.All code and datasets are publicly available at \\url{https:\/\/github.com\/Aurora-slz\/Synth-Empathy}."
提供机构:
IEEE DataPort
创建时间:
2026-04-17



