five

zjukg/SKA-Bench

收藏
Hugging Face2025-08-26 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/zjukg/SKA-Bench
下载链接
链接失效反馈
官方服务:
资源简介:
# SKA-Bench - An implementation for [SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs](https://arxiv.org/pdf/2507.17178) ### Environment ```bash conda create -n skabench python=3.9.0 conda activate skabench pip install openai pip install asyncio pip install uvloop ``` ### Testbed Construction For noisy robustness, order insensitivity and information integration testbeds, you can run: ```bash python process_dataset.py --type KG --sequence random --scale 1k ``` **NOTE:** Please write the data type in `type`, sequence type in `sequence`, the size of scale in `size` before running the code. Then the test set will be generated in the `dataset` folder. For negative rejection, you can run: ```bash python process_dataset.py --type Table --sequence original --scale 4k --negative_rejection negative_rejection python process_dataset.py --type KG --sequence random --scale 4k --negative_rejection negative_rejection python process_dataset.py --type Table+Text --sequence original --scale 12k --negative_rejection negative_rejection python process_dataset.py --type KG+Text --sequence random --scale 12k --negative_rejection negative_rejection ``` ### Evaluating scripts For noisy robustness, order insensitivity and information integration testbeds, you can run: ```bash python evaluate.py --type <type> --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k.json ``` **NOTE:** Please change the data type in `<type>`, the api key in `<api_key>`, the api url in `<api_url>`, the model type in `<model>`, and dataset dir in the position of `./dataset/Table_original_42_4k.json`. For negative rejection, you can run: ```bash python evaluate_negative.py --type KG --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG_random_42_4k_negative_rejection.json python evaluate_negative.py --type Table --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table_original_42_4k_negative_rejection.json python evaluate_negative.py --type KG+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/KG+Text_random_42_12k_negative_rejection.json python evaluate_negative.py --type Table+Text --api_key <api_key> --api_url <api_url> --model <model> --dataset_dir ./dataset/Table+Text_original_42_12k_negative_rejection.json ``` ### 🤝 Cite: Please consider citing this paper if you find our work useful. ```bigquery @article{liu2025ska, title={SKA-Bench: A Fine-Grained Benchmark for Evaluating Structured Knowledge Understanding of LLMs}, author={Liu, Zhiqiang and Niu, Enpei and Hua, Yin and Sun, Mengshu and Liang, Lei and Chen, Huajun and Zhang, Wen}, journal={arXiv preprint arXiv:2507.17178}, year={2025} } ```
提供机构:
zjukg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作