christiqn/EVT-items
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/christiqn/EVT-items
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-4.0
task_categories:
- text-classification
task_ids:
- multi-class-classification
tags:
- expectancy-value-theory
- psychology
- psychometrics
- educational-psychology
- motivation
- survey-instruments
- content-analysis
pretty_name: EVT Evaluation Set — Human-Coded Psychological Test Items
size_categories:
- 1K<n<10K
---
EVT Evaluation Set
## Dataset Summary
This dataset contains human-coded psychological test items drawn from published, peer-reviewed instruments measuring components of **Expectancy-Value Theory** (EVT; Eccles et al., 1983; Eccles & Wigfield, 2002). It was constructed as an out-of-domain evaluation benchmark for the [EVT Item Classifier](https://huggingface.co/christiqn/mpnet-EVT-classifier) and was **not used during model training**, which relied exclusively on synthetically generated items. It therefore provides an ecologically valid test of generalisation from synthetic to real, human-authored scale items.
| Property | Value |
|---|---|
| Total items (raw) | 1,632 |
| Items in final evaluation set | 1,284 |
| Items excluded from evaluation | 348 |
| Items used in benchmark | 1,284 |
| EVT constructs | 6 (5 core + OTHER) |
| Source instruments | 95 |
| Language | English |
| Domain | Educational psychology, motivation science, psychometrics |
## Source Data Collection
### Systematic Literature Search
To gather a comprehensive database of published items measuring components of EVT, a systematic literature search was conducted in the **Web of Science** database using the following query:
```
TS=("expectancy-value theory" OR "expectancy-value model" OR "situated expectancy-value"
OR SEVT OR "expectanc* for success" OR "ability belief*" OR "subjective task value"
OR "attainment value" OR "intrinsic value" OR "utility value")
AND
TS=(measure* OR instrument* OR scale* OR questionnaire* OR item* OR inventory
OR assess* OR psychometric* OR validat*)
```
During the initial title and abstract screening, studies were included only if they explicitly reported the assessment of at least one component of EVT. Inclusion was further restricted to:
- English-language publications
- Published, peer-reviewed papers
- Papers available under Open-Access licenses
- Papers accessible through the Crossref, Unpaywall, or OpenAlex API
These constraints were imposed to facilitate a fully automated retrieval and extraction pipeline.
### Automated Item Extraction
Following screening, full-text PDFs were retrieved automatically. Item extraction used **optical character recognition (OCR)**, which iterated through the raw text of each downloaded manuscript to identify reported scale items. Extracted strings were preprocessed to:
- Remove OCR artefacts (character substitution errors, line-break artefacts)
- Strip citation brackets and in-text references
- Remove numerical item prefixes (e.g., "1.", "Q3:")
This produced a set of clean, unlabeled text strings representing candidate scale items.
### Manual Filtering
Candidate items were then manually reviewed and filtered to remove:
- Items not clearly formulated as self-report items (e.g., instruction text, anchor labels)
- Items not externally assessable without additional context
- Items confounded by multiple response anchors that would prevent unambiguous construct assignment
Items passing manual filtering were mapped to their EVT construct by a human expert coder, or flagged as `OTHER` if they did not correspond to any EVT component.
## Dataset Structure
### File Format
Semicolon-delimited CSV (UTF-8 with BOM), 1,632 rows + header, 4 columns.
### Fields
| Field | Type | Description |
|---|---|---|
| `scale` | string | Name of the source instrument or questionnaire. Set to `NOT_REPORTED` when the original publication did not name the scale. |
| `item` | string | Text of the psychological test item, preprocessed to remove OCR artefacts, citation brackets, and numerical prefixes. |
| `human_coder` | string | EVT construct label assigned by a human expert. One of: `ATTAINMENT_VALUE`, `COST`, `EXPECTANCY`, `INTRINSIC_VALUE`, `UTILITY_VALUE`, `OTHER`. |
| `decision` | string / null | Exclusion flag. Non-null values indicate why the item was excluded from the primary evaluation: `third-person`, `multiple anchors`, or `other`. Null = item is included. |
### Class Distribution
| EVT Construct | Description | N (raw) |
|---|---|---|
| `EXPECTANCY` | Beliefs about future success | 308 |
| `OTHER` | Not classifiable as an EVT construct | 283 |
| `INTRINSIC_VALUE` | Enjoyment and interest | 234 |
| `UTILITY_VALUE` | Usefulness for future goals | 199 |
| `COST` | Perceived negative consequences of engagement | 149 |
| `ATTAINMENT_VALUE` | Personal importance of doing well | 111 |
| **Total** | | **1,284** |
> Items with a null `human_coder` value in the raw file correspond to rows where no EVT label was assigned (e.g., `NOT_REPORTED` scale entries or unlabelled extractions). These are not included in the evaluation set.
## Exclusion Criteria and the `decision` Column
Items that passed manual filtering but were identified as methodologically problematic are retained in the dataset with a non-null `decision` value. These items were excluded from the primary classifier evaluation but are preserved for secondary analyses.
| Exclusion Reason | Description | N |
|---|---|---|
| `third-person` | Items formulated about a third party rather than the respondent directly (e.g., "students find this interesting"), making self-report intent ambiguous. | 67 |
| `multiple anchors` | Items where the response scale contains multiple distinct conceptual anchors that confound construct assignment. | 29 |
| `other` | OCR artefacts, formatting fragments, instruction text, missing context, or items that were not self-contained scale items. | 252 |
| **Total excluded** | | **348** |
Items used in the primary model evaluation are those where `decision` is null (**N = 1,284**).
## Instrument Coverage
Items were sourced from **95 distinct instruments** spanning a broad range of academic domains and populations. This diversity was intentional: the dataset tests whether EVT construct labels generalise across contexts, not just within a single instrument or domain.
### Domain Coverage
- Mathematics and statistics education
- Science, technology, engineering, and mathematics (STEM)
- Language learning (ESL, reading, writing)
- Health and clinical domains (e.g., breastfeeding motivation, occupational safety)
- Music and arts education
- Higher education and career motivation
- General academic motivation (cross-domain instruments)
## Coding Procedure
### Construct Definitions
Items were coded according to the theoretical definitions from Eccles et al. (1983) and Eccles & Wigfield (2002):
- **ATTAINMENT_VALUE** — The personal importance of doing well on a task, tied to the individual's identity and self-concept.
- **COST** — The perceived negative consequences of engagement, including opportunity cost, effort expenditure, and emotional cost.
- **EXPECTANCY** — Beliefs about future success on a task, including ability self-concept and confidence.
- **INTRINSIC_VALUE** — The enjoyment, interest, or subjective pleasure derived from engagement with a task.
- **UTILITY_VALUE** — The perceived usefulness of a task for future goals, careers, or activities.
- **OTHER** — Items that do not correspond to any of the five core EVT constructs.
### Coding Rules
A single label was assigned per item. When an item could plausibly belong to multiple constructs, the coder assigned the label corresponding to the most salient construct, or flagged the item as `multiple anchors` if the ambiguity was considered irresolvable. Items formulated in the third person were flagged as `third-person` and excluded from the primary evaluation.
## Intended Uses
- Benchmark evaluation of automated EVT construct classifiers
- Annotation studies and inter-rater reliability research
- Training data for future classifiers (with appropriate domain-shift caveats)
- Systematic review and content analysis of EVT instrumentation
- Psychometric research on construct validity and item coverage across instruments
## Limitations
- **English only.** All items are in English. Cross-lingual generalisation should not be assumed.
- **Synthetic-to-real gap.** The associated classifier was trained on synthetic items. This dataset documents real-world generalisation but does not eliminate the distribution gap.
- **Open-access bias.** Inclusion was limited to open-access publications, which may not be fully representative of the EVT instrumentation literature.
- **OCR noise.** Despite preprocessing, some items may retain minor OCR artefacts from the extraction pipeline.
- **Domain imbalance.** Certain domains (mathematics, STEM) are more heavily represented than others (health, arts), reflecting the literature distribution rather than deliberate sampling.
## References
- Eccles, J. S., et al. (1983). Expectancies, values, and academic behaviors. In J. T. Spence (Ed.), *Achievement and achievement motives* (pp. 75–146). W. H. Freeman.
- Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. *Annual Review of Psychology, 53*(1), 109–132.
- Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. *Contemporary Educational Psychology, 25*(1), 68–81.
提供机构:
christiqn



