MIT-WAL/Gemini_3.1_202_Task_AI_Exposure_Scores

Name: MIT-WAL/Gemini_3.1_202_Task_AI_Exposure_Scores
Creator: MIT-WAL
Published: 2026-04-03 16:48:55
License: 暂无描述

Hugging Face2026-04-03 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/MIT-WAL/Gemini_3.1_202_Task_AI_Exposure_Scores

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit language: - en pretty_name: Gemini 3.1 2026 Task AI Exposure Scores tags: - ai - labor - future-of-work - onet - text-classification - occupational-exposure task_categories: - text-classification size_categories: - 10K<n<100K --- # Gemini 3.1 2026 Task AI Exposure Scores ## Dataset Summary This dataset contains task-level AI exposure labels for O*NET task statements. Each task is classified into one of four categories, E0, E1, E2, or E3, using an updated 2026 Agentic AI Exposure Rubric and a Gemini 3.1 Pro classification pipeline. The labels are designed to capture whether a task can be accelerated by a frontier agentic AI system directly, whether it would require deeper software integration, or whether image capabilities are the key additional requirement. ## Motivation This dataset extends earlier task exposure work that used ChatGPT 4 by moving to Gemini 3.1 Pro and by introducing a revised rubric aligned with the frontier of 2026 agentic AI systems. The goal was not simply to swap models, but to update the evaluation framework itself so that exposure scores better reflect the capabilities of modern agentic systems with browsing, tool use, and workflow support. In particular, the new rubric distinguishes more clearly between tasks that can be accelerated by a standalone agentic chatbot, tasks that require deeper enterprise integration to unlock large productivity gains, and tasks where image understanding is the binding capability. In the classification pipeline, this updated rubric is passed directly to the API as part of the model instructions, or cached and reused across calls, and each O*NET task statement is assigned a structured label with a short explanation. The full codebase used to generate these classifications, including the rubric text and API prompting logic, is available in the SMP500 repository: https://github.com/MIT-Work-Analytics-Laboratory/SMP500. :contentReference[oaicite:0]{index=0} ## What the Dataset Contains Each row corresponds to a task statement, typically linked to an occupation title and task identifier. The dataset includes the original task text together with the model generated exposure label and a short explanation. The four labels are: E0: No meaningful direct exposure E1: Direct exposure to a standalone agentic AI system E2: Exposure that would likely require additional software or enterprise integration E3: Exposure where image capabilities are the critical enabling factor ## Data Fields The dataset may contain the following fields: `Task ID` Unique identifier for the task statement. `Title` Occupation title associated with the task. `Task` Original O*NET task statement. `exposure_label` Predicted exposure class, one of E0, E1, E2, or E3. `explanation` Short rationale generated by the model for the assigned label. `model_name` Model used for classification. `timestamp` Timestamp of classification. ## Methodology The classification pipeline uses Gemini 3.1 Pro through Vertex AI to assign exposure labels to task statements. The process relies on a structured prompt containing the 2026 Agentic AI Exposure Rubric, which defines exposure in terms of whether access to a frontier agentic AI system could reduce the time needed to complete a task by at least half at equivalent quality. The implementation includes structured JSON output, batching, checkpointing, retry logic, and resume safe execution. The rubric is either sent directly in the system instructions or cached for reuse across requests to improve reproducibility and reduce prompt overhead. Based on the uploaded code, the pipeline is explicitly configured around `gemini-3.1-pro-preview`, structured JSON enforcement, task level explanations, and checkpoint based recovery. :contentReference[oaicite:1]{index=1} ## Relation to Earlier GPT 4 Based Work Earlier versions of this task exposure effort relied on ChatGPT 4 style labeling workflows. This release updates both the model and the rubric. The main differences are: A shift from an earlier GPT 4 based setup to Gemini 3.1 Pro A new rubric explicitly tailored to 2026 agentic AI capabilities A sharper distinction between direct exposure, integration dependent exposure, and image dependent exposure A more reproducible classification pipeline with structured outputs and checkpointed execution As a result, this dataset should be understood not as a simple relabeling with a different model, but as a revised exposure framework designed for a more advanced AI capability frontier. ## Results Using the updated Gemini 3.1 pipeline and the new 2026 agentic AI exposure rubric, most O*NET tasks are classified as E0, with 10,237 tasks falling into the no direct exposure category. We then observe 4,692 tasks in E1, 3,336 in E2, and 521 in E3, suggesting that while a large share of tasks remain primarily physical or otherwise not directly accelerated by a standalone agentic system, a substantial portion of the task space is now exposed either directly or through additional software and image capabilities. Compared with the earlier Eloundou style labeling, the new rubric produces a meaningful redistribution of tasks across categories. Original E0 tasks remain highly stable, with 94.9% still classified as E0, but many tasks previously labeled E2 shift downward into E0 or E1 under the revised framework. In particular, only 35.0% of original E2 tasks remain in E2, while 28.5% move to E0 and 36.5% move to E1, reflecting the fact that the new rubric separates tasks requiring true enterprise integration from those now achievable by a frontier standalone agentic system. ## Comparison with Earlier Eloundou Labels ### Count matrix Rows are original labels, columns are new labels. | Original \\ New | E0 | E1 | E2 | All | |---|---:|---:|---:|---:| | E0 | 8125 | 189 | 247 | 8561 | | E1 | 333 | 1567 | 802 | 2702 | | E2 | 2284 | 2921 | 2797 | 8002 | | All | 10742 | 4677 | 3846 | 19265 | ### Row normalised matrix Percent of each original bucket reassigned under the new rubric. | Original \\ New | E0 | E1 | E2 | |---|---:|---:|---:| | E0 | 94.9% | 2.2% | 2.9% | | E1 | 12.3% | 58.0% | 29.7% | | E2 | 28.5% | 36.5% | 35.0% | ## Intended Uses This dataset is intended for research and analysis related to: Occupational exposure to AI Task based labor market analysis Workforce transformation and future of work studies Firm, sector, or occupation level exposure aggregation Comparisons between theoretical and observed AI applicability ## Limitations These labels are model generated and should not be interpreted as ground truth. They reflect a specific rubric, a specific model, and a specific framing of what counts as AI exposure. Exposure does not imply full automation, displacement, or economic impact. Users should also keep in mind that task level labels may contain noise, edge case errors, or judgment calls that depend on the exact wording of the task statement. ## Licensing This dataset is released under the MIT License. ## Repository and Code The code used to generate this dataset, including the classification pipeline and the updated rubric sent to the API, can be found here: https://github.com/MIT-Work-Analytics-Laboratory/SMP500 ## Citation - If you use this dataset, please cite the MIT Work Analytics Laboratory and reference the SMP500 repository.

提供机构：

MIT-WAL

5,000+

优质数据集

54 个

任务类型

进入经典数据集