ThatsGroes/synthetic-from-classification-tasks-danish
收藏Hugging Face2025-01-24 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/ThatsGroes/synthetic-from-classification-tasks-danish
下载链接
链接失效反馈官方服务:
资源简介:
该数据集的目的是用于丹麦文本分类任务的预训练或后训练嵌入模型。数据集包含100,000个样本,由gemma-2-27b-it生成。每个样本包括提示(prompt)和模型输出(response)两列,样本由从https://huggingface.co/datasets/ThatsGroes/classification-tasks-processed随机抽取的种子任务生成。数据生成过程遵循了论文https://arxiv.org/pdf/2401.00368中描述的流程。
The dataset is intended for pre- or post-training embedding models for Danish text classification tasks. It consists of 100,000 samples generated with gemma-2-27b-it. Each sample includes a prompt and a model response. The samples were generated from seed tasks randomly sampled from https://huggingface.co/datasets/ThatsGroes/classification-tasks-processed, following the data generation process described in the paper https://arxiv.org/pdf/2401.00368.
提供机构:
ThatsGroes



