MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-Smoler-ShareGPT
收藏Hugging Face2025-03-22 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-Smoler-ShareGPT
下载链接
链接失效反馈官方服务:
资源简介:
Llama-Nemotron-Post-Training-Dataset-v1-Smoler-ShareGPT是一个转换为ShareGPT格式并合并为单一数据集的NVIDIA Llama-Nemotron-Post-Training-Dataset-v1的小版本。该数据集包含用户、助手和系统角色的对话。它保留了所有原始列,并添加了ShareGPT格式的`messages`字段和用于跟踪来源的`original_split`字段。采样策略根据数据集的大小不同而有所不同,小于1M的样本采用100%,大于1M的样本采用5%。README中提供了原始分割大小和采样后的最终大小。该数据集继承了原始NVIDIA数据集的许可。
Llama-Nemotron-Post-Training-Dataset-v1-Smoler-ShareGPT is a smaller version of NVIDIAs Llama-Nemotron-Post-Training-Dataset-v1, converted to ShareGPT format and merged into a single dataset. The dataset includes conversations with roles: user, assistant, and system. It retains all original columns and adds a `messages` field in ShareGPT format and an `original_split` to track the source. The sampling strategy varies based on the dataset size, with splits less than 1M being 100% sampled and splits greater than 1M being 5% sampled. The README provides original split sizes and final sizes after sampling. The dataset inherits the license from the original NVIDIA dataset.
提供机构:
MaziyarPanahi



