Replication data for: Evaluating named entity recognition - a comparative analysis of mono- and multilingual transformer models on a novel brazilian corporate earnings call transcripts dataset

Name: Replication data for: Evaluating named entity recognition - a comparative analysis of mono- and multilingual transformer models on a novel brazilian corporate earnings call transcripts dataset
Creator: Repositório de Dados de Pesquisa da Unicamp
Published: 2026-02-19 13:44:38
License: 暂无描述

DataCite Commons2026-02-19 更新2026-05-07 收录

下载链接：

https://redu.unicamp.br/citation?persistentId=doi:10.25824/redu/YI280E

下载链接

链接失效反馈

官方服务：

资源简介：

This package contains a dataset comprising 384 earnings call transcripts from Brazilian banks, along with the accompanying Jupyter notebooks used for preprocessing, annotating, and fine-tuning. The notebooks are specifically designed for fine-tuning BERT- and T5-based transformer models for the task of financial Named Entity Recognition (NER). The submission is organized into two main files: File: SourceCode.zip – This file includes the original PDF files of the transcripts and a series of Jupyter notebooks (Python) that document the step-by-step methodology of the study: 1) text extraction and sentence pre-processing; 2) weak supervision for annotation; 3) generation of train, validation, and test splits; and 4) fine-tuning of the Transformer models. File: Datasets.zip – This file contains a single CSV file with all raw sentences extracted from the PDFs, as well as a subfolder with the annotated sentences, already divided into standard training, validation, and testing sets to facilitate reproducible research.

提供机构：

Repositório de Dados de Pesquisa da Unicamp

创建时间：

2026-02-17