zjukg/SKA-Bench
收藏Hugging Face2025-08-26 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/zjukg/SKA-Bench
下载链接
链接失效反馈官方服务:
资源简介:
# SKA-Bench
- An implementation for [SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs](https://arxiv.org/pdf/2507.17178)
### Environment
```bash
conda create -n skabench python=3.9.0
conda activate skabench
pip install openai
pip install asyncio
pip install uvloop
```
### Testbed Construction
For noisy robustness, order insensitivity and information integration testbeds, you can run:
```bash
python process_dataset.py --type KG --sequence random --scale 1k
```
**NOTE:**
Please write the data type in `type`, sequence type in `sequence`, the size of scale in `size` before running the code. Then the test set will be generated in the `dataset` folder.
For negative rejection, you can run:
```bash
python process_dataset.py --type Table --sequence original --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type KG --sequence random --scale 4k --negative_rejection negative_rejection
python process_dataset.py --type Table+Text --sequence original --scale 12k --negative_rejection negative_rejection
python process_dataset.py --type KG+Text --sequence random --scale 12k --negative_rejection negative_rejection
```
### Evaluating scripts
For noisy robustness, order insensitivity and information integration testbeds, you can run:
```bash
python evaluate.py --type <type> --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k.json
```
**NOTE:**
Please change the data type in `<type>`, the api key in `<api_key>`, the api url in `<api_url>`, the model type in `<model>`, and dataset dir in the position of `./dataset/Table_original_42_4k.json`.
For negative rejection, you can run:
```bash
python evaluate_negative.py --type KG --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG_random_42_4k_negative_rejection.json
python evaluate_negative.py --type Table --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k_negative_rejection.json
python evaluate_negative.py --type KG+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG+Text_random_42_12k_negative_rejection.json
python evaluate_negative.py --type Table+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table+Text_original_42_12k_negative_rejection.json
```
### 🤝 Cite:
Please consider citing this paper if you find our work useful.
```bigquery
@article{liu2025ska,
title={SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs},
author={Liu, Zhiqiang and Niu, Enpei and Hua, Yin and Sun, Mengshu and Liang, Lei and Chen, Huajun and Zhang, Wen},
journal={arXiv preprint arXiv:2507.17178},
year={2025}
}
```
提供机构:
zjukg



