Database of Occupational Benefits Costs in Brazil (Period from March 2022 to March 2024)
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/88r8dcffzg
下载链接
链接失效反馈官方服务:
资源简介:
We used the Knowledge Discovery in Databases (KDD) method to create the database.
In the first KDD stage, we selected 25 files containing benefit data issued by INSS (National Social Security Institute). These files cover March 2022 to March 2024, representing the only period available for download on the Brazilian Open Data portal at the time of our research. We exported all these data files to the SQL Server 2020 database using the Microsoft SQL Server Management Studio's import and export wizard. Each file generated a distinct table.
A query to the tables revealed they contained all social security benefits issued by INSS. For this research, we only focused on occupational benefits (B91, B92, B93, and B94). Thus, in the first step of the second KDD stage, we excluded benefits not related to work-related diseases and accidents. Still, within data preprocessing, we equalized the number of columns in each table. To do this, we removed columns not present in all tables or those irrelevant for analysis, such as "Meio pagamento" and "Banco."
In the data transformation stage, we created columns for the country's region and the benefit code. We populated the region column based on the benefit's state of issuance. Subsequently, we filled the benefit code column according to the benefit's name.
Finally, we exported all tables into a single table, "tbbeneficios," within the DBCustos database. This resulted in a table with over 18 million rows and a backup file of approximately 5 GB. The Table 1 presents the structure of the "tbbeneficios" table.
本研究采用数据库知识发现(Knowledge Discovery in Databases, KDD)方法搭建实验数据库。在KDD的第一阶段,我们选取了由INSS(National Social Security Institute,巴西国家社会保障局)发布的25份福利数据文件。这些文件的时间覆盖范围为2022年3月至2024年3月,也是本研究开展时巴西开放数据门户上可下载的唯一可用时间段。我们通过Microsoft SQL Server Management Studio的导入导出向导,将所有数据文件导入至SQL Server 2020数据库中,每个数据文件对应生成一张独立的数据表。对上述数据表的查询结果显示,其包含了INSS发布的全部社会保障福利数据。本研究仅聚焦于职业福利类别(B91、B92、B93及B94)。因此,在KDD第二阶段的第一步中,我们剔除了与职业性疾病和工伤事故无关的福利数据。与此同时,在数据预处理环节,我们对各数据表的列数进行了统一处理:移除了所有数据表中未统一包含的列,以及与分析无关的列,例如"Meio pagamento"和"Banco"。在数据转换阶段,我们新增了国家区域和福利代码两列。其中,区域列的取值依据福利发放所在的州进行填充;福利代码列则根据福利名称进行赋值。最终,我们将所有数据表整合为单张名为"tbbeneficios"的表,并存储于DBCustos数据库中。整合后的表包含超过1800万条数据记录,对应的备份文件大小约为5GB。表1展示了"tbbeneficios"表的结构。
创建时间:
2025-12-15



