Replication data for: Evaluating named entity recognition - a comparative analysis of mono- and multilingual transformer models on a novel brazilian corporate earnings call transcripts dataset
收藏DataCite Commons2026-02-19 更新2026-05-07 收录
下载链接:
https://redu.unicamp.br/citation?persistentId=doi:10.25824/redu/YI280E
下载链接
链接失效反馈官方服务:
资源简介:
This package contains a dataset comprising 384 earnings call transcripts from Brazilian banks, along with the accompanying Jupyter notebooks used for preprocessing, annotating, and fine-tuning. The notebooks are specifically designed for fine-tuning BERT- and T5-based transformer models for the task of financial Named Entity Recognition (NER). The submission is organized into two main files: File: SourceCode.zip – This file includes the original PDF files of the transcripts and a series of Jupyter notebooks (Python) that document the step-by-step methodology of the study: 1) text extraction and sentence pre-processing; 2) weak supervision for annotation; 3) generation of train, validation, and test splits; and 4) fine-tuning of the Transformer models. File: Datasets.zip – This file contains a single CSV file with all raw sentences extracted from the PDFs, as well as a subfolder with the annotated sentences, already divided into standard training, validation, and testing sets to facilitate reproducible research.
提供机构:
Repositório de Dados de Pesquisa da Unicamp
创建时间:
2026-02-17



