Brazilian Politician dataset for Record Linkage
收藏Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/7957492
下载链接
链接失效反馈官方服务:
资源简介:
The Brazilian political dataset from the Tribunal Superior Eleitoral (TSE) is a comprehensive and valuable resource. It contains information about Brazilian politicians, including their names, political parties, electoral districts, and personal data such as race, gender, address, place, and date of birth. The TSE has published a version of this dataset every two years from 1992 until now, updating the information of the politicians. The structure of the TSE dataset can be utilized to create rich and meaningful datasets for record linkage projects. This means the data can link information from different versions (presented over the year) and create a more comprehensive understanding of politician data evolution. Observing the data's evolution makes it possible to identify patterns and connections that might not be apparent otherwise. This can also aid in identifying irregularities or illegal activities related to political campaigns. Overall, the TSE dataset's structure lends itself well to record linkage and populational projects, making it a valuable resource for researchers and analysts. We use the politician TSE dataset to build a dataset that can be used in the record linkage and privacy-preserving record linkage context. Moreover, we leverage the modification/updates in the politician's personal information (reflected in the dataset) over time to build our dataset. For example, a politician can marry and change his/her name, or the record could be inserted with a typo in the political affiliations. We created a dataset for the record linkage application by utilizing the variations in the original TSE dataset, which resulted in real-world linkage errors. This dataset can be useful for training classifiers and measuring the accuracy of both record linkage and privacy-preserving record linkage.
巴西高等选举法院(Tribunal Superior Eleitoral,TSE)发布的巴西政治数据集是一份全面且极具价值的研究资源。该数据集涵盖巴西政界人士的各类信息,包括姓名、所属政党、选举选区,以及种族、性别、住址、出生地和出生日期等个人数据。巴西高等选举法院自1992年起直至今日,每两年便会发布一版该数据集,持续更新政界人士的相关信息。该数据集的结构可被用于构建丰富且具有实际意义的记录链接项目数据集,具体而言,即可将不同年份发布的数据集版本进行关联整合,从而更全面地理解政界人士数据的演变历程。通过观察数据的演变轨迹,能够识别出原本难以察觉的模式与关联,同时还有助于发现与政治竞选相关的违规或违法行为。总体而言,巴西高等选举法院数据集的结构非常适配记录链接与群体研究项目,对于研究人员与分析师而言是极具价值的资源。本次研究所用的巴西高等选举法院政界人士数据集,被用于构建可应用于记录链接及隐私保护型记录链接场景的数据集。此外,我们还利用了随时间推移产生的政界人士个人信息变更(在数据集中有所体现)来构建该数据集,例如政界人士可能因结婚更改姓名,或是相关记录在填写政党归属时出现录入错误。我们借助原始巴西高等选举法院数据集中存在的各类变体,构建了适用于记录链接任务的数据集,这些变体均对应真实场景下的链接错误。该数据集可用于训练分类器,并评估记录链接与隐私保护型记录链接的准确率。
创建时间:
2023-06-28



