zjukg/SKA-Bench

Name: zjukg/SKA-Bench
Creator: zjukg
Published: 2025-08-26 13:10:25
License: 暂无描述

Hugging Face2025-08-26 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/zjukg/SKA-Bench

下载链接

链接失效反馈

官方服务：

资源简介：

# SKA-Bench - An implementation for [SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs](https://arxiv.org/pdf/2507.17178) ### Environment ```bash conda create -n skabench python=3.9.0 conda activate skabench pip install openai pip install asyncio pip install uvloop ``` ### Testbed Construction For noisy robustness, order insensitivity and information integration testbeds, you can run: ```bash python process_dataset.py --type KG --sequence random --scale 1k ``` **NOTE:** Please write the data type in `type`, sequence type in `sequence`, the size of scale in `size` before running the code. Then the test set will be generated in the `dataset` folder. For negative rejection, you can run: ```bash python process_dataset.py --type Table --sequence original --scale 4k --negative_rejection negative_rejection python process_dataset.py --type KG --sequence random --scale 4k --negative_rejection negative_rejection python process_dataset.py --type Table+Text --sequence original --scale 12k --negative_rejection negative_rejection python process_dataset.py --type KG+Text --sequence random --scale 12k --negative_rejection negative_rejection ``` ### Evaluating scripts For noisy robustness, order insensitivity and information integration testbeds, you can run: ```bash python evaluate.py --type <type> --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k.json ``` **NOTE:** Please change the data type in `<type>`, the api key in `<api_key>`, the api url in `<api_url>`, the model type in `<model>`, and dataset dir in the position of `./dataset/Table_original_42_4k.json`. For negative rejection, you can run: ```bash python evaluate_negative.py --type KG --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG_random_42_4k_negative_rejection.json python evaluate_negative.py --type Table --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k_negative_rejection.json python evaluate_negative.py --type KG+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG+Text_random_42_12k_negative_rejection.json python evaluate_negative.py --type Table+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table+Text_original_42_12k_negative_rejection.json ``` ### 🤝 Cite: Please consider citing this paper if you find our work useful. ```bigquery @article{liu2025ska, title={SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs}, author={Liu, Zhiqiang and Niu, Enpei and Hua, Yin and Sun, Mengshu and Liang, Lei and Chen, Huajun and Zhang, Wen}, journal={arXiv preprint arXiv:2507.17178}, year={2025} } ```

提供机构：

zjukg

5,000+

优质数据集

54 个

任务类型

进入经典数据集