SJSU-SP24-DATA298-T3/ImmigrationQA
收藏Hugging Face2024-09-26 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/SJSU-SP24-DATA298-T3/ImmigrationQA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
language:
- en
tags:
- Law
- Immigration
size_categories:
- 1K<n<10K
---
# Dataset Card for Immigration Laws
<!-- Provide a quick summary of the dataset. -->
This dataset card aims to be a base template for new datasets. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1).
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
- **Curated by:** [Deva](https://huggingface.co/gdevakumar) & [Yasaman](https://huggingface.co/yasamanne)
- **Language(s) (NLP):** English (en)
- **License:** MIT License
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
<!-- This section describes suitable use cases for the dataset. -->
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
- This dataset is generated using AI from USCIS and travel state websites and other sources.
- While we strive for accuracy, we cannot guarantee that all information provided is current, complete, or error-free.
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
```
[
{
"question": "What is the purpose of Form DS-3036?",
"answer": "To permit the named foreign national to schedule an interview at a U.S. embassy or consulate to seek to obtain a J visa to enter the United States as an Exchange Visitor Program participant or as an accompanying spouse and dependent."
},
{
"question": "What is the category of Exchange Visitor Program participants that does not permit extensions beyond the duration of participation?",
"answer": "Short-term scholars."
},
]
```
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
- This dataset can be used to finetune LLMs to adapt to immigration based laws and visa regulations.
- This can be used as an evaluation benchmark dataset to evaluate models for law and immigration regulatory applications.
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
- [USCIS](https://www.uscis.gov/)
- [Travel State Gov](https://travel.state.gov/content/travel/en/us-visas.html)
- Reddit
- Others
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
[More Information Needed]
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
[More Information Needed]
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
[More Information Needed]
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
## Dataset Card Authors [optional]
[Deva](https://huggingface.co/gdevakumar)
## Dataset Card Contact
[Deva](https://huggingface.co/gdevakumar)
---
许可证:MIT许可证
任务类别:
- 问答
语言:
- 英语(en)
标签:
- 法律
- 移民
规模类别:
- 1000 < 样本数量 < 10000
---
# 移民法律数据集卡片
<!-- 提供数据集的简要概述。 -->
本数据集卡片旨在作为新建数据集的基础模板,其基于[该原始模板](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md?plain=1)生成。
## 数据集详情
### 数据集描述
<!-- 提供该数据集的详细概述。 -->
- **整理者:** [Deva](https://huggingface.co/gdevakumar) 与 [Yasaman](https://huggingface.co/yasamanne)
- **自然语言处理所用语言:** 英语(en)
- **许可证:** MIT许可证
## 数据集用途
<!-- 说明本数据集的预期使用场景相关问题。 -->
### 直接使用场景
<!-- 本部分描述本数据集的适用场景。 -->
[需补充更多信息]
### 超出范围的使用场景
<!-- 本部分说明不当使用、恶意使用以及本数据集无法良好适配的使用场景。 -->
- 本数据集由人工智能基于美国公民及移民服务局(USCIS)、美国国务院旅行网站及其他来源的数据生成。
- 尽管我们致力于确保信息准确,但无法保证所提供的全部信息均为最新、完整且无错误的。
## 数据集结构
<!-- 本部分提供数据集字段的描述,以及关于数据集结构的额外信息,例如划分数据集的标准、数据点之间的关系等。 -->
[
{
"问题": "DS-3036表格的用途是什么?",
"回答": "允许指定的外国公民预约美国大使馆或领事馆的面谈,以申请J签证,以交流访问者计划参与者的身份,或作为随行配偶及受扶养人进入美国。"
},
{
"问题": "哪一类交流访问者计划参与者不允许延长参与期限?",
"回答": "短期学者。"
}
]
## 数据集构建
### 整理初衷
<!-- 说明创建该数据集的动机。 -->
- 本数据集可用于对大语言模型(LLM)进行微调,以适配移民相关法律及签证监管场景。
- 本数据集可作为评估基准数据集,用于评估面向法律及移民监管应用的模型性能。
### 源数据
<!-- 本部分描述源数据(例如:新闻文本与标题、社交媒体帖子、翻译后的句子等)。 -->
- [美国公民及移民服务局(USCIS)](https://www.uscis.gov/)
- [美国国务院旅行网站(Travel State Gov)](https://travel.state.gov/content/travel/en/us-visas.html)
- Reddit
- 其他来源
#### 数据收集与处理流程
<!-- 本部分描述数据收集与处理流程,例如数据选择标准、过滤与归一化方法、所用工具与库等。 -->
[需补充更多信息]
#### 源数据生产者
<!-- 本部分描述最初创建数据的个人或系统。若源数据创建者有自我报告的人口统计或身份信息,也应在此处说明。 -->
[需补充更多信息]
#### 个人与敏感信息
<!-- 说明数据集是否包含可被视为个人、敏感或隐私的数据(例如:揭示地址、唯一可识别的姓名或别名、种族或族裔出身、性取向、宗教信仰、政治观点、财务或健康数据等)。若已采取匿名化措施,请描述匿名化流程。 -->
[需补充更多信息]
## 偏差、风险与局限性
<!-- 本部分旨在说明技术与社会技术层面的局限性。 -->
[需补充更多信息]
## 数据集卡片作者(可选)
[Deva](https://huggingface.co/gdevakumar)
## 数据集卡片联系人
[Deva](https://huggingface.co/gdevakumar)
提供机构:
SJSU-SP24-DATA298-T3



