five

Brazilian University Paper Abstracts Dataset (2013-2023), Classified by Sustainable Development Goals (SDGs)

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/hzgs5kz2bc
下载链接
链接失效反馈
官方服务:
资源简介:
The data was collected from SciVal, a platform that hosts Scopus statistics. All metadata were obtained from the top 25 Brazilian universities between 2013 and 2023, according to the Center for World University Ranking (CWUR) in 2023. The dataset contains abstracts of published scientific papers classified according to the Sustainable Development Goals (SDGs) by the Scopus team. The original dataset consists of 15,488 records and 20 columns. We preprocessed the data to train a language model capable of classifying Brazilian research projects according to the SDGs. During preprocessing, we removed duplicate records, multi-label entries, samples missing abstracts, and unnecessary columns. The preprocessed dataset contains 13,789 records and two columns, where the SDG classification is represented in the "label" column. The classification ranged from 1 to 17 representing all 17 SDGs in order. After preprocessing the dataset, we balanced it by equalizing the majority and minority classes to 300 records per class. In other words, for majority classes with more than 300 records, we reduced the count to 300. For minority classes with fewer than 300 records, we generated the remaining records using the generative model Mixtral-8x7B-Instruct-v0.1, using the real abstracts as examples. This dataset serves as a valuable resource for training language models tailored to classify scientific texts from Brazil based on the SDGs. The 17 SDGs are: 1. No Poverty 2. Zero Hunger 3. Good Health and Well-being 4. Quality Education 5. Gender Equality 6. Clean Water and Sanitation 7. Affordable and Clean Energy 8. Decent Work and Economic Growth 9. Industry Innovation and Infrastructure 10. Reduced Inequality 11. Sustainable Cities and Communities 12. Responsible Consumption and Production 13. Climate Action 14. Life Below Water 15. Life on Land 16. Peace Justice and Strong Institutions 17. Partnerships for the Goals

本数据集采集自SciVal——一个托管Scopus统计数据的学术平台。所有元数据均采集自2023年世界大学排名中心(Center for World University Ranking, CWUR)评选出的2013至2023年间巴西排名前25的高校。本数据集包含已发表学术论文的摘要,这些论文均由Scopus团队依据可持续发展目标(Sustainable Development Goals, SDGs)完成分类。原始数据集共包含15488条记录与20个字段,我们对数据开展预处理工作,以训练可依据SDGs对巴西科研项目进行分类的大语言模型。 预处理阶段,我们移除了重复记录、多标签条目、缺失摘要的样本以及冗余字段。预处理后的数据集共包含13789条记录与2个字段,其中SDG分类结果存储于"label"字段中,分类编号覆盖1至17,依次对应全部17项可持续发展目标。 数据集预处理完成后,我们通过将多数类与少数类样本量均调整为每类300条的方式实现数据平衡。具体而言,对于样本量超过300条的多数类,我们将其样本量缩减至300条;对于样本量不足300条的少数类,我们以真实摘要作为示例,通过生成式模型Mixtral-8x7B-Instruct-v0.1生成剩余所需样本。本数据集可作为训练针对性模型的宝贵资源,用于依据SDGs对巴西学术文本进行分类。 17项可持续发展目标如下: 1. 消除贫困(No Poverty) 2. 零饥饿(Zero Hunger) 3. 良好健康与福祉(Good Health and Well-being) 4. 优质教育(Quality Education) 5. 性别平等(Gender Equality) 6. 清洁饮水与卫生设施(Clean Water and Sanitation) 7. 可负担的清洁能源(Affordable and Clean Energy) 8. 体面工作和经济增长(Decent Work and Economic Growth) 9. 产业、创新和基础设施(Industry Innovation and Infrastructure) 10. 减少不平等(Reduced Inequality) 11. 可持续城市和社区(Sustainable Cities and Communities) 12. 负责任的消费和生产(Responsible Consumption and Production) 13. 气候行动(Climate Action) 14. 水下生命(Life Below Water) 15. 陆地生命(Life on Land) 16. 和平、正义与强有力的机构(Peace Justice and Strong Institutions) 17. 促进目标实现的伙伴关系(Partnerships for the Goals)
创建时间:
2024-06-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作