Academic Career Pathways of 80,000 Japan-Based Scholars: A Structured Longitudinal Dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/9AXZG7
下载链接
链接失效反馈官方服务:
资源简介:
Abstract This dataset presents a large-scale, fully structured record of the academic career trajectories of 80,000 scholars working in Japanese universities and research institutes. The dataset was derived from an original corpus of 300,000 publicly available researcher profiles and 1.5 million free-text entries documenting educational history, employment positions, and research activities on the Japan Research Map (J-ResearchMap) platform, a national researcher registry supported by the Japan Science and Technology Agency (JST). Because the source data is highly heterogeneous and unstructured, a multi-stage processing pipeline was developed. First, large language models (LLMs) were used to extract entities (institutions, titles, disciplines) and temporal relations (start–end year) from biography-style text. Second, extracted information was normalized through rule-based cleaning, institutional name reconciliation, and career-stage mapping. Finally, the dataset underwent human verification to ensure high accuracy in position titles, institutional disambiguation, discipline tagging, and chronological ordering. This hybrid workflow was carried out over ten months and resulted in a research-grade, machine-readable dataset suitable for longitudinal and comparative analysis. Each scholar is represented by a sequenced career path that includes up to six standardized academic stages: PhD, Postdoctoral Research, Assistantship, Lectureship, Associate Professorship, and Professorship. Associated attributes include academic field category, year of appointment, and a harmonized institution name that permits aggregation by sector, geography, or institutional prestige level. This dataset fills a major empirical gap in studying the Japanese academic labor market, which has historically lacked open, structured, individual-level data. It enables quantitative research on promotion timing, institutional mobility, discipline-specific career models, gender disparities, aging of the professoriate, and international comparison with other higher education systems. It also serves as a benchmark dataset for text-to-structure extraction and as a training corpus for academic-domain LLM evaluation.
创建时间:
2025-11-09



