josuediazflores/aao-eb1a-decisions
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/josuediazflores/aao-eb1a-decisions
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
- text-generation
- question-answering
language:
- en
tags:
- legal
- immigration
- lora
- fine-tuning
- o1a
- eb1a
- uscis
- aao
pretty_name: AAO EB-1A Extraordinary Ability Decisions (Structured)
size_categories:
- 1K<n<10K
---
# AAO EB-1A Extraordinary Ability Decisions — Structured Dataset
Structured extractions from **1,466 USCIS Administrative Appeals Office (AAO) non-precedent decisions** on EB-1A extraordinary ability petitions (I-140). Each case has been decomposed into structured components by Claude Sonnet for use in fine-tuning legal reasoning models.
Part of [Project Greenlight](https://github.com/josuediazflores/ProjectGreenlight) — an AI-powered O-1A/EB-1A visa intelligence system.
## Dataset Description
Each JSON file represents one AAO decision with the following fields:
| Field | Type | Description |
|-------|------|-------------|
| `petitioner_background` | string | Who the petitioner is, their field, nationality, career summary (2-4 sentences) |
| `evidence_per_criterion` | array | Per-criterion breakdown with evidence submitted, AAO analysis, and met/not-met |
| `outcome` | string | `sustain`, `dismiss`, or `remand` |
| `outcome_reasoning` | string | AAO's overall reasoning for the final decision (2-5 sentences) |
| `legal_citations` | array | Every case, statute, or regulation cited |
| `fraud_or_procedural_issues` | string/null | Any fraud findings or procedural problems |
| `filename` | string | Source PDF filename |
| `score` | float | Quality score (0-10) from automated rubric |
| `original_outcome` | string | Outcome from initial scoring pass |
| `original_criteria` | array | Criteria identified in initial scoring pass |
### Evidence Per Criterion Schema
```json
{
"criterion": "original contributions",
"evidence_submitted": "Detailed description of what the petitioner submitted...",
"aao_analysis": "What the AAO said about this evidence and why it met or failed...",
"met": false
}
```
Valid criterion values: `awards`, `membership`, `published material`, `judging`, `original contributions`, `scholarly articles`, `exhibition`, `leading role`, `high salary`, `commercial success`
## Dataset Statistics
| Metric | Value |
|--------|-------|
| Total cases | 1,466 |
| Avg criteria per case | 5.9 |
| Cases with fraud/procedural issues | 989 (67%) |
### Outcome Distribution
| Outcome | Count | Percentage |
|---------|-------|------------|
| Dismiss | 1,390 | 94.8% |
| Sustain | 45 | 3.1% |
| Remand | 31 | 2.1% |
### Criteria Frequency
| Criterion | Cases Discussing |
|-----------|-----------------|
| Original contributions | 1,186 |
| Published material | 1,135 |
| Awards | 1,120 |
| Leading role | 1,118 |
| Judging | 983 |
| Membership | 976 |
| Scholarly articles | 750 |
| High salary | 628 |
| Exhibition | 574 |
| Commercial success | 247 |
### Top Legal Citations
| Citation | Frequency |
|----------|-----------|
| 8 C.F.R. § 204.5(h)(2) | 1,441 |
| 8 C.F.R. § 204.5(h)(3) | 1,422 |
| INA § 203(b)(1)(A) | 1,338 |
| Kazarian v. USCIS, 596 F.3d 1115 (9th Cir. 2010) | 959 |
## Data Source
All cases sourced from the [USCIS AAO Non-Precedent Decisions repository](https://www.uscis.gov/administrative-appeals/aao-decisions/aao-non-precedent-decisions), filtered to I-140 Extraordinary Ability (EB-1A) petitions. These are public domain government documents.
> **Note:** USCIS stopped publishing new AAO decisions in March 2025.
## Data Pipeline
1. **Scrape** — 4,643 EB-1A PDFs downloaded from USCIS AAO repository
2. **Parse** — Text extracted via PyMuPDF + Tesseract OCR (0 failures)
3. **Score** — Quality-scored against a detailed rubric (criteria coverage, analytical depth, evidence discussion, legal reasoning, outcome clarity)
4. **Deduplicate** — TF-IDF cosine similarity at 0.95 threshold removed 18 near-duplicate pairs
5. **Filter** — 1,467 cases selected (score ≥ 7.0)
6. **Extract** — Claude Sonnet decomposed each decision into structured JSON (1,466 succeeded, 1 failed)
## Intended Use
- Fine-tuning language models for immigration legal reasoning (LoRA adapters)
- Training data for O-1A/EB-1A criteria analysis, gap identification, RFE prediction, and outcome prediction
- Research on legal NLP and administrative decision-making patterns
## Limitations
- Heavy class imbalance: 95% dismiss, 3% sustain, 2% remand
- Structured extractions were generated by an LLM (Claude Sonnet), not manually verified at scale
- Pre-March 2025 decisions only
- EB-1A decisions used as proxy for O-1A (identical evidentiary criteria)
## License
MIT
提供机构:
josuediazflores



