five

gliclass-v3-logic-dataset

收藏
魔搭社区2025-12-04 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/knowledgator/gliclass-v3-logic-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/I9RAQol7giilBHbbf2T7M.png) # GLiClass‑V3 Logic Dataset **Rows**  7 776 | **Split**  train only | **Format**  Parquet | **Language**  EN | **License**  Apache‑2.0 ## What it is A length‑balanced corpus of single‑sentence prompts built purely for inducing reasoning in language models. ## Why it helps * Teaches symbolic‑logic patterns and multi‑label behaviour. * Buckets cover 15 word‑length ranges (4 → 1,024) in equal proportions, exposing models to both tiny and very long inputs. * Each example has **1‑50 true** and **1‑50 false** labels, forcing the model to cope with large, variable answer sets. ## Where the prompts come from Re‑annotated snippets drawn from three public resources: | Source dataset | Notes | |----------------|-------| | **FineWeb** (clean web crawl) | Plain sentences automatically filtered for quality, then labelled with LLM. | | **tau/CommonsenseQA** | Question stems only; each converted to a declarative premise and re‑labelled multi‑label style. | | **GLiClass‑2k prototype** (`BioMike/formal‑logic‑reasoning‑gliclass‑2k`) | Earlier formal‑logic items. | | **nyu‑mll/MultiNLI** | Premise/hypothesis pairs. | ## Data schema | Column | Type | Notes | |---------------|-----------------|------------------------------------------| | `text` | string | Sentence or short passage. | | `true_labels` | list\<string\> | All correct answers. | | `all_labels` | list\<string\> | `true_labels` + distractors (shuffled). | ## Quick load ```python from datasets import load_dataset ds = load_dataset("knowledgator/gliclass-v3-logic-dataset")["train"] ``` ## Citation ```bibtex @misc{stepanov2025gliclassgeneralistlightweightmodel, title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko}, year={2025}, eprint={2508.07662}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.07662}, } ```

![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/I9RAQol7giilBHbbf2T7M.png) # GLiClass-V3 逻辑数据集(GLiClass‑V3 Logic Dataset) **行数**  7 776 | **拆分集**  仅训练集 | **格式**  Parquet | **语言**  英语 | **许可协议**  Apache‑2.0 ## 数据集概述 这是一个专为诱导大语言模型(Large Language Model, LLM)推理而构建的长度均衡单句提示语料库。 ## 数据集价值 * 传授符号逻辑模式与多标签分类行为。 * 数据分桶覆盖15个词长区间(4→1024词),比例均等,让模型同时接触极短与极长输入样本。 * 每个样本包含**1至50个正标签**与**1至50个负标签**,迫使模型处理规模可变的庞大答案集合。 ## 提示语来源 本数据集的提示语取自三个公开资源,并经过重新标注: | 源数据集 | 备注 | |----------------|-------| | **FineWeb**(清理后的网页爬虫数据集) | 经自动质量过滤的纯语句,由大语言模型完成标注。 | | **tau/CommonsenseQA** | 仅保留问题题干,转换为陈述性前提后,以多标签风格重新标注。 | | **GLiClass‑2k 原型数据集**(`BioMike/formal‑logic‑reasoning‑gliclass‑2k`) | 早期的形式逻辑推理样本。 | | **nyu-mll/MultiNLI** | 前提与假设对样本。 | ## 数据模式 | 列名 | 数据类型 | 备注 | |---------------|-----------------|------------------------------------------| | `text` | 字符串 | 语句或短篇文本。 | | `true_labels` | 字符串列表 | 全部正确答案。 | | `all_labels` | 字符串列表 | `true_labels` + 干扰项(已打乱)。 | ## 快速加载 python from datasets import load_dataset ds = load_dataset("knowledgator/gliclass-v3-logic-dataset")["train"] ## 引用 bibtex @misc{stepanov2025gliclassgeneralistlightweightmodel, title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks}, author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko}, year={2025}, eprint={2508.07662}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.07662}, }
提供机构:
maas
创建时间:
2025-07-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作