ceselder/loracle-pretrain-qa-v3b-preview1k
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ceselder/loracle-pretrain-qa-v3b-preview1k
下载链接
链接失效反馈官方服务:
资源简介:
loracle-pretrain-qa-v3b-preview1k是一个预训练问答数据集,包含1,050行数据,涉及350个主题。该数据集强调描述内容而非来源,禁止了描述来源的模式。数据集包含多种问题类型(T1-T6),每种类型有特定的回答格式和要求。覆盖范围包括多种语言(英语、西班牙语、德语、法语、俄语、荷兰语、中文)和多种主题,同时包含一定比例的有毒内容和非英语内容。数据集的生成使用了claude-haiku-4-5模型,并通过两轮批处理流程生成问题和答案。
loracle-pretrain-qa-v3b-preview1k is a pretraining QA dataset with 1,050 rows covering 350 organisms. The dataset emphasizes describing the content rather than the source, banning the pattern of describing the medium. It includes multiple question types (T1-T6) with specific answer formats and requirements. The coverage spans multiple languages (English, Spanish, German, French, Russian, Dutch, Chinese) and various topics, with a certain proportion of toxic and non-English content. The dataset was generated using the claude-haiku-4-5 model through a two-round batch flow.
提供机构:
ceselder



