five

Academic Career Pathways of 80,000 Japan-Based Scholars: A Structured Longitudinal Dataset

收藏
DataONE2025-11-09 更新2025-11-22 收录
下载链接:
https://search.dataone.org/view/sha256:a0928cc4c4a0d44bdc18512f41aa466a005afb1276d379ff4d2bb204a9c7e7c3
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract This dataset presents a large-scale, fully structured record of the academic career trajectories of 80,000 scholars working in Japanese universities and research institutes. The dataset was derived from an original corpus of 300,000 publicly available researcher profiles and 1.5 million free-text entries documenting educational history, employment positions, and research activities on the Japan Research Map (J-ResearchMap) platform, a national researcher registry supported by the Japan Science and Technology Agency (JST). Because the source data is highly heterogeneous and unstructured, a multi-stage processing pipeline was developed. First, large language models (LLMs) were used to extract entities (institutions, titles, disciplines) and temporal relations (start–end year) from biography-style text. Second, extracted information was normalized through rule-based cleaning, institutional name reconciliation, and career-stage mapping. Finally, the dataset underwent human verification to ensure high accuracy in position titles, institutional disambiguation, discipline tagging, and chronological ordering. This hybrid workflow was carried out over ten months and resulted in a research-grade, machine-readable dataset suitable for longitudinal and comparative analysis. Each scholar is represented by a sequenced career path that includes up to six standardized academic stages: PhD, Postdoctoral Research, Assistantship, Lectureship, Associate Professorship, and Professorship. Associated attributes include academic field category, year of appointment, and a harmonized institution name that permits aggregation by sector, geography, or institutional prestige level. This dataset fills a major empirical gap in studying the Japanese academic labor market, which has historically lacked open, structured, individual-level data. It enables quantitative research on promotion timing, institutional mobility, discipline-specific career models, gender disparities, aging of the professoriate, and international comparison with other higher education systems. It also serves as a benchmark dataset for text-to-structure extraction and as a training corpus for academic-domain LLM evaluation.
创建时间:
2025-11-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作