cognitivecomputations/OpenCoder-LLM_opc-sft-stage2-DolphinLabeled
收藏Hugging Face2025-01-05 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/cognitivecomputations/OpenCoder-LLM_opc-sft-stage2-DolphinLabeled
下载链接
链接失效反馈官方服务:
资源简介:
OpenCoder-LLM SFT DolphinLabeled数据集是OpenCoder-LLM SFT数据集的一个子集,用于筛选和标记原始数据集中的内容。该数据集通过去除重复的指令和添加标记列来标注输出内容是否包含拒绝回答、未经请求的建议、不适当的内容、个人身份信息或免责声明。数据集包括四个部分:educational_instruct、evol_instruct、mceval_instruct和package_instruct,分别来源于算法语料库、开源的MagicCoder-Evol-Instruct-110k和McEval-Instruct数据集,以及Python包的接口文档。
The OpenCoder-LLM SFT DolphinLabeled dataset is a subset of the OpenCoder-LLM SFT dataset, used for filtering and labeling the content in the original dataset. The dataset removes duplicate instructions and adds a flag column to label whether the output contains refusals, unsolicited advice, nsfw content, pii, or disclaimers. The dataset consists of four parts: educational_instruct, evol_instruct, mceval_instruct, and package_instruct, sourced from an algorithmic corpus, the open-source MagicCoder-Evol-Instruct-110K and McEval-Instruct datasets, and interface documentation from Python packages.
提供机构:
cognitivecomputations



