Misinformation Span Detection - EI22 & BOL4Y

Name: Misinformation Span Detection - EI22 & BOL4Y
Creator: Zenodo
Published: 2026-05-06 13:34:00
License: 暂无描述

DataCite Commons2026-05-06 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.19097541

下载链接

链接失效反馈

官方服务：

资源简介：

BOL4Y & EI22 Description of each data provided in the paper. Contents The repository includes the following data files: - dump.csv — Parsed and normalized fact-check metadata extracted from AosFatos HTML pages - aos_fatos_pages.zip — Raw HTML pages crawled from AosFatos (BOL4Y) - escriba-csv.zip — Transcripts from escriba - bol4y.csv.gz — Main BOL4Y dataset (compressed CSV) - ei22.csv — Main EI22 dataset (CSV) - bol4y-transcripts.zip — Transcripts associated with BOL4Y - ei22-transcripts.csv — Transcripts associated with EI22 Data dump.csv This file contains parsed fact-checking metadata from AosFatos, used to build BOL4Y.Original data is in Brazilian Portuguese; column names were translated to English for usability. Column name Description title Title of the fact check date Publication date of the fact check aos_fatos_link URL of the original AosFatos page fact_check Fact-checking paragraph topic_pt Topic(s) of the false claim (Portuguese) source Origin of the claim (e.g., livestream, speech) source_urls URLs to the original content being fact-checked repetition_count Number of times the same claim was repeated year_days_pair Year, month, and day of each claim occurrence. Includes repeated claims page Crawl page index where the fact check was found fact_check_id Unique identifier for the fact check (shared with BOL4Y) BOL4Y (bol4y.csv.gz) CSV was compressed for easier storage. In it, you will find the following fields: Column name Description fact_check_id Identifier matching dump.csv file Source file for the transcript transcription_source Transcription system (escriba or whisper) transcription_index Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper transcription_text Claim or transcript segment text label 1 = misinformation, 0 = non-misinformation fixed_date Date of the fact check unique-check Unique ID combining file and segment information EI22 Each line of this csv is a segment Column name Description file_id Identifier of the source file transcript_timestamp_ids Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper transcript Full transcript of the file (EI22 only) minute_timestamp Minute-level timestamp provided by AosFatos description Metadata description from AosFatos label 1 = misinformation, 0 = non-misinformation unique-check Unique ID combining file and segment information segment_text Claim or segment text Note: For youtube videos, the file_id field corresponds to the id of the video on youtube Additional Resources HuggingFace Repository Github repository

提供机构：

Zenodo

创建时间：

2026-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集