CHATS-Lab/Verbalized-Sampling-Synthetic-Data-Generation
收藏Hugging Face2025-10-29 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/CHATS-Lab/Verbalized-Sampling-Synthetic-Data-Generation
下载链接
链接失效反馈官方服务:
资源简介:
Verbalized-Sampling-Synthetic-Data-Generation数据集展示了如何使用Verbalized Sampling (VS)技术生成高质量的、多样化的数学推理任务的合成训练数据。该数据集包含使用不同方法生成的数学问题解决方案对,使用最新的LLMs。数据集展示了合成数据的质量、解决方案的多样性、合成数据的规模以及不同方法的比较。Verbalized Sampling方法产生更多样化的解决方案策略,同时保持解决方案的正确性,从而能够创建更丰富的合成训练数据集。GPT-4.1和Gemini-2.5-Flash模型在用不同的方法提示时生成互补类型的解决方案。
The Verbalized-Sampling-Synthetic-Data-Generation dataset showcases how Verbalized Sampling (VS) can be used to generate high-quality, diverse synthetic training data for mathematical reasoning tasks. It contains mathematical problem-solution pairs generated by different methods using state-of-the-art LLMs. The dataset demonstrates the quality of synthetic data, solution diversity, scaling of synthetic data, and method comparison between Direct and Verbalized Sampling approaches. Verbalized Sampling methods produce more diverse solution strategies while maintaining solution correctness, enabling the creation of richer synthetic training datasets. Models like GPT-4.1 and Gemini-2.5-Flash generate complementary types of solutions when prompted with different methods.
提供机构:
CHATS-Lab



