Misinformation Span Detection - EI22 & BOL4Y
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19097541
下载链接
链接失效反馈官方服务:
资源简介:
BOL4Y & EI22
Description of each data provided in the paper.
Contents
The repository includes the following data files:
- dump.csv — Parsed and normalized fact-check metadata extracted from AosFatos HTML pages
- aos_fatos_pages.zip — Raw HTML pages crawled from AosFatos (BOL4Y)
- escriba-csv.zip — Transcripts from escriba
- bol4y.csv.gz — Main BOL4Y dataset (compressed CSV)
- ei22.csv — Main EI22 dataset (CSV)
- bol4y-transcripts.zip — Transcripts associated with BOL4Y
- ei22-transcripts.csv — Transcripts associated with EI22
Data
dump.csv
This file contains parsed fact-checking metadata from AosFatos, used to build BOL4Y.Original data is in Brazilian Portuguese; column names were translated to English for usability.
Column name
Description
title
Title of the fact check
date
Publication date of the fact check
aos_fatos_link
URL of the original AosFatos page
fact_check
Fact-checking paragraph
topic_pt
Topic(s) of the false claim (Portuguese)
source
Origin of the claim (e.g., livestream, speech)
source_urls
URLs to the original content being fact-checked
repetition_count
Number of times the same claim was repeated
year_days_pair
Year, month, and day of each claim occurrence. Includes repeated claims
page
Crawl page index where the fact check was found
fact_check_id
Unique identifier for the fact check (shared with BOL4Y)
BOL4Y (bol4y.csv.gz)
CSV was compressed for easier storage. In it, you will find the following fields:
Column name
Description
fact_check_id
Identifier matching dump.csv
file
Source file for the transcript
transcription_source
Transcription system (escriba or whisper)
transcription_index
Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper
transcription_text
Claim or transcript segment text
label
1 = misinformation, 0 = non-misinformation
fixed_date
Date of the fact check
unique-check
Unique ID combining file and segment information
EI22
Each line of this csv is a segment
Column name
Description
file_id
Identifier of the source file
transcript_timestamp_ids
Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper
transcript
Full transcript of the file (EI22 only)
minute_timestamp
Minute-level timestamp provided by AosFatos
description
Metadata description from AosFatos
label
1 = misinformation, 0 = non-misinformation
unique-check
Unique ID combining file and segment information
segment_text
Claim or segment text
Note: For youtube videos, the file_id field corresponds to the id of the video on youtube
Additional Resources
HuggingFace Repository
Github repository
提供机构:
Zenodo
创建时间:
2026-03-19



