hkust-nlp/llm-compression
收藏Hugging Face2024-04-16 更新2024-04-19 收录
下载链接:
https://hf-mirror.com/datasets/hkust-nlp/llm-compression
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
language:
- en
annotations_creators:
- no-annotation
task_categories:
- text-generation
task_ids:
- language-modeling
size_categories:
- 10K<n<100K
configs:
- config_name: python
data_files:
- split: test
path:
- data/python.jsonl
- config_name: cc
data_files:
- split: test
path:
- data/cc.jsonl
- config_name: arxiv_math
data_files:
- split: test
path:
- data/arxiv_math.jsonl
---
This is the compression corpora dataset used in the paper "Compression Represents Intelligence Linearly".
We find that LLMs’ intelligence – reflected by benchmark scores – almost **linearly** correlates with their ability to compress external text corpora. We measure intelligence along three key abilities: knowledge and commonsense, coding, and mathematical reasoning, and provide the corresponding compression corpora here respectively named cc, python, and arxiv_math.
### Load the data
```python
from datasets import load_dataset
dataset=load_dataset(r"hkust-nlp/llm-compression",name="python")
print(dataset['test'][0])
```
More details on compression evaluation are at our [github page](https://github.com/hkust-nlp/llm-compression-intelligence).
### Citation
```
@misc{huang2024compression,
title={Compression Represents Intelligence Linearly},
author={Yuzhen Huang and Jinghan Zhang and Zifei Shan and Junxian He},
year={2024},
eprint={2404.09937},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
提供机构:
hkust-nlp
原始信息汇总
数据集概述
- 许可证: cc-by-nc-sa-4.0
- 语言: 英语
- 任务类别: 文本生成
- 任务ID: 语言建模
- 大小类别: 10K<n<100K
数据集配置
-
配置名称: python
- 数据文件:
- 分割: 测试
- 路径: data/python.jsonl
- 数据文件:
-
配置名称: cc
- 数据文件:
- 分割: 测试
- 路径: data/cc.jsonl
- 数据文件:
-
配置名称: arxiv_math
- 数据文件:
- 分割: 测试
- 路径: data/arxiv_math.jsonl
- 数据文件:
数据集用途
本数据集用于论文“Compression Represents Intelligence Linearly”中,用于研究大型语言模型(LLMs)的智能与其压缩外部文本语料库能力之间的线性关系。数据集包含三个部分,分别对应知识与常识、编程、数学推理三个关键能力:
- cc: 知识与常识
- python: 编程
- arxiv_math: 数学推理



