five

Misinformation Span Detection - EI22 & BOL4Y

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19097541
下载链接
链接失效反馈
官方服务:
资源简介:
BOL4Y & EI22 Description of each data provided in the paper.   Contents The repository includes the following data files: - dump.csv — Parsed and normalized fact-check metadata extracted from AosFatos HTML pages - aos_fatos_pages.zip — Raw HTML pages crawled from AosFatos (BOL4Y) - escriba-csv.zip — Transcripts from escriba - bol4y.csv.gz — Main BOL4Y dataset (compressed CSV) - ei22.csv — Main EI22 dataset (CSV) - bol4y-transcripts.zip — Transcripts associated with BOL4Y - ei22-transcripts.csv — Transcripts associated with EI22   Data dump.csv This file contains parsed fact-checking metadata from AosFatos, used to build BOL4Y.Original data is in Brazilian Portuguese; column names were translated to English for usability.     Column name Description title Title of the fact check date Publication date of the fact check aos_fatos_link URL of the original AosFatos page fact_check Fact-checking paragraph topic_pt Topic(s) of the false claim (Portuguese) source Origin of the claim (e.g., livestream, speech) source_urls URLs to the original content being fact-checked repetition_count Number of times the same claim was repeated year_days_pair Year, month, and day of each claim occurrence. Includes repeated claims page Crawl page index where the fact check was found fact_check_id Unique identifier for the fact check (shared with BOL4Y) BOL4Y (bol4y.csv.gz) CSV was compressed for easier storage. In it, you will find the following fields:   Column name Description fact_check_id Identifier matching dump.csv file Source file for the transcript  transcription_source Transcription system (escriba or whisper) transcription_index Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper transcription_text Claim or transcript segment text label 1 = misinformation, 0 = non-misinformation fixed_date Date of the fact check unique-check Unique ID combining file and segment information   EI22 Each line of this csv is a segment   Column name Description file_id Identifier of the source file transcript_timestamp_ids Segment ID(s) associated with each segment. For some fact-checked segmentes, you'll see more than one id listed in this field, following the concatenation approach discussed in the paper transcript Full transcript of the file (EI22 only) minute_timestamp Minute-level timestamp provided by AosFatos description Metadata description from AosFatos label 1 = misinformation, 0 = non-misinformation unique-check Unique ID combining file and segment information segment_text Claim or segment text       Note: For youtube videos, the file_id field corresponds to the id of the video on youtube     Additional Resources HuggingFace Repository Github repository
提供机构:
Zenodo
创建时间:
2026-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作