Codebook and Annotated Dataset for LLM Career Guidance Analysis Across Ten African Countries
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/xt42c2jfdg
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the annotated responses, coding schema, and reliability statistics used in a comparative analysis of large language models (LLMs) for computing career guidance across ten African countries. The underlying research examined how different LLMs articulate technical and professional competencies for entry-level computing roles and the extent to which these recommendations reflect local contextual factors.
The dataset includes 60 LLM-generated responses (six models × ten countries) produced using a standardized prompt. Models analyzed include ChatGPT-4, Claude 3.5 Sonnet, Gemini 1.5 Pro, LLaMA 3, DeepSeek-V2, and Mistral. Countries represented are Egypt, South Africa, Senegal, Tunisia, Kenya, Nigeria, Ghana, Benin, Zambia, and Morocco. All responses were collected in English within a fixed time window and without follow-up prompts.
Responses were analyzed using a conceptual content analysis approach informed by the CC2020 Computing Curricula framework. Coding captured both technical competencies (e.g., programming, algorithms, AI/ML, cloud computing) and professional competencies (e.g., adaptability, teamwork, communication, ethics). In addition, a contextual awareness analysis assessed the presence of country-specific references, including local technology ecosystems, language or cultural considerations, national policies, and institutional mentions. Coding reflects the presence of concepts rather than strict keyword matching.
The dataset also includes a rubric-based scoring sheet evaluating response quality across four dimensions (technical coverage, contextual awareness, skills balance, and depth), inter-rater reliability (IRR) data from dual coders, pooled and category-level reliability statistics, and aggregated summaries. Together, these materials enable replication of the reported analyses, secondary analysis of LLM behavior in career guidance contexts, and methodological reuse of the coding framework. The dataset is intended for research and educational purposes and reflects model behavior at the time of data collection.
创建时间:
2026-01-26



