IKMLab-team/cfever

Name: IKMLab-team/cfever
Creator: IKMLab-team
Published: 2026-02-26 05:00:40
License: 暂无描述

Hugging Face2026-02-26 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/IKMLab-team/cfever

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 configs: - config_name: default data_files: - split: train path: train*.jsonl - split: dev path: dev*.jsonl - split: test path: test*.jsonl - config_name: train data_files: - split: train path: train*.jsonl - config_name: dev data_files: - split: dev path: dev*.jsonl - config_name: test data_files: - split: test path: test*.jsonl - config_name: wiki_pages data_files: - split: plain path: wiki*.jsonl --- # CFEVER-data ## Introduction to CFEVER This repository contains the dataset for our AAAI 2024 paper, "CFEVER: A Chinese Fact Extraction and VERification Dataset". [Paper link](https://doi.org/10.1609/aaai.v38i17.29825). ## Leaderboard website Please visit https://ikmlab.github.io/CFEVER to check the leaderboard of CFEVER. ## How to load CFEVER ```python from datasets import load_dataset # Get ready data ["train", "dev", "test"] ds = load_dataset("IKMLab-team/cfever") # Get separted data by: train, dev, test = ds["train"], ds["dev"], ds["test"] # If you only want a part of data train = load_dataset("IKMLab-team/cfever", name="train")["train"] # or train = load_dataset("IKMLab-team/cfever", split="train") # wiki page data: wiki_pages = load_dataset("IKMLab-team/cfever", name="wiki_pages")["plain"] ``` ## Repository structure ``` CFEVER-data ├── dev.jsonl # CFEVER development set ├── test.jsonl # CFEVER test set without labels and evidence ├── train.jsonl # CFEVER training set ├── wiki*.jsonl # CFEVER wiki pages ├── LICENSE ├── README.md └── sample_submission.jsonl # sample submission file of the test set ``` ## Evaluation - Please refer to our codebase: https://github.com/IKMLab/CFEVER-baselines/?tab=readme-ov-file#evaluations ## Submission - Please include three fields (necessary) in the prediction file for each claim in the test set. - `id` - `predicted_label` - `predicted_evidence` - The `id` field has been already included in [the test set](data/test.jsonl). Please do not change the order. - The `predicted_label` should be one of `supports`, `refutes`, or `NOT ENOUGH INFO`. - The `predicted_evidence` should be a list of evidence sentences, where each evidence sentence is represented by a list of `[page_id, line_number]`. For example: ``` # One evidence sentence for the claim { "id": 1, "predicted_label": "REFUTES", "predicted_evidence": [ ["page_id_2", 2], ] } ``` ``` # Two evidence sentences for the claim { "id": 1, "predicted_label": "SUPPORTS", "predicted_evidence": [ ["page_id_1", 1], ["page_id_2", 2], ] } ``` ``` # The claim cannot be verified { "id": 1, "predicted_label": "NOT ENOUGH INFO", "predicted_evidence": None } ``` - After creating the prediction file, please email the file to yingjia.lin.public@gmail.com with a brief description of your method. We will evaluate your submission and update the leaderboard. - A randomly generated submission file can be found [here](sample_submission.jsonl). - Note that `claim` is not necessary to be included in the submission file. - You can also check [the prediction example for the development set](https://github.com/IKMLab/CFEVER-baselines/blob/main/simple_baseline/data/dumb_dev_pred.jsonl) and follow [the evaluation steps](https://github.com/IKMLab/CFEVER-baselines/tree/main?tab=readme-ov-file#sentence-retrieval-and-claim-verification) from [our CFEVER-baselines repo](https://github.com/IKMLab/CFEVER-baselines). ## Licensing Information CFEVER's data annotations incorporate content from Wikipedia, which is licensed under the Wikipedia Copyright Policy. Users of this dataset are responsible for ensuring that their use, redistribution, and downstream applications comply with all applicable licenses and attribution requirements of the Wikipedia license terms. ## Reference If you find our work useful, please cite our paper. ``` @article{Lin_Lin_Yeh_Li_Hu_Hsu_Lee_Kao_2024, title = {CFEVER: A Chinese Fact Extraction and VERification Dataset}, author = {Lin, Ying-Jia and Lin, Chun-Yi and Yeh, Chia-Jen and Li, Yi-Ting and Hu, Yun-Yu and Hsu, Chih-Hao and Lee, Mei-Feng and Kao, Hung-Yu}, doi = {10.1609/aaai.v38i17.29825}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, month = {Mar.}, number = {17}, pages = {18626-18634}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/29825}, volume = {38}, year = {2024}, bdsk-url-1 = {https://ojs.aaai.org/index.php/AAAI/article/view/29825}, bdsk-url-2 = {https://doi.org/10.1609/aaai.v38i17.29825} } ```

提供机构：

IKMLab-team

5,000+

优质数据集

54 个

任务类型

进入经典数据集