refugee-law-lab/luck-of-the-draw-iii

Name: refugee-law-lab/luck-of-the-draw-iii
Creator: refugee-law-lab
Published: 2024-01-25 18:01:51
License: 暂无描述

Hugging Face2024-01-25 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/refugee-law-lab/luck-of-the-draw-iii

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 language: - en - fr size_categories: - 100K<n<1M --- # Refugee Law Lab: Luck of the Draw III: Data ## Dataset Summary The [Refugee Law Lab](https://refugeelab.ca) supports bulk open-access to Canadian legal data to facilitate research and advocacy. Bulk open-access helps avoid asymmetrical access-to-justice and amplification of marginalization that results when commercial actors leverage proprietary legal datasets for profit -- a particular concern in the border control setting. This is the dataset used for a [research project](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4322881) published in the Queen's Law Journal, undertaken at the Refugee Law Lab about outcomes in stays of removal in Canada's Federal Court. Specifically, it includes information from the online Federal Court dockets for all immigration law cases filed between 1997 and 2022. The dataset can be used for legal analytics (i.e. identifying patterns in legal decision-making), to test ML and NLP tools on a bilingual dataset of Canadian legal materials, and to pretrain language models for various tasks. ## Dataset Structure ### Data Instance The datset includes a single data instance of all online Federal Court dockets involving immigration law filed between 1997 and 2022, as they appeared when the data was gathered in November 2022. ### Data Fields Data fields match the formart used for the Refugee Law Lab's [Canadian Legal Data dataset](https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data). - citation (string): Legal citation for the document (neutral citation where available). In this dataset, the legal citaiton is the docket number, which is a identifer for the file assigned by the Federal Court. Docket numbers take the form IMM-#-YY. IMM signals that this is an immigration law docket, # is a sequential number starting at 1 that represents the order in which applications were received in a given year, and YY is the last two digits of the year in which the application was initially filed. - year (int32): Year of the document date, which can be useful for filtering. For this dataset, the year is the year when the application was initially filed. - name (string): Name of the document, in this dataset the style of cause of a cour file - date_filed (string): Date of the document (yyyy-mm-dd). In this dataset the year is the date the application was filed. - city_filed (string): City where the application was initially filed - nature (string): A category of proceedings assigned by the Federal Court - class (string): A second category of proceedings assigned by the Federal court - track (string): A third category of proceedings assigned by the Federal Court - documents (list of dictionaries): A list of dictionaries containing each docket entry (or row in the table of docket entries in a docket). Each dictionary contains the following key/value pairs: * RE_NO: The number assigned to the docket entry by the Federal Court * DOCNO: Where the entry involves the filing of a document, the number assigned to that document by the Federal Court * DOC_DT: The date of the docket entry * RECORDED_ENTRY: The content of the docket entry - source_url (string): URL where the document was scraped and where the official version can be found - scraped_timestamp (string): Date the document was scraped (yyyy-mm-dd) ### Data Languages Some dockets are in English, some in French, and some alternate between English and French ### Data Splits The data has not been split, so all data is in the train split. ### Data Loading To load the data: ```python from datasets import load_dataset dataset = load_dataset("refugee-law-lab/luck-of-the-draw-iii", split="train") ``` To convert to dataframe: ```python from datasets import load_dataset dataset = load_dataset("refugee-law-lab/luck-of-the-draw-iii", split="train") ``` ## Dataset Creation ### Curation Rationale The dataset includes all Federal Court immigration law dockets available on the Federal Court's website at the time of research (November 2022). The Refugee Law Lab gathered this data for several projects, including the [Refugee Law Lab Portal](https://rllp.ca/) and the research article on Federal Court stays linked above. ### Source Data #### Source All data was gathered via the Federal Court's [website](https://www.fct-cf.gc.ca/en/home). #### Initial Data Collection and Normalization Details are available via links on the Refugee Law Lab's Github respository [Luck of the Draw III: Code & Data] (https://github.com/Refugee-Law-Lab/luck-of-the-draw-iii). ### Personal and Sensitive Information Documents may include personal and sensitive information. All documents have been published online by the Federal Court. While the open court principle mandates that court materials be made available to the public, there are privacy risks when these materials become easily and widely available. These privacy risks are particularly acute for marginalized groups, including refugees and other non-citizens whose personal and sensitive information is included in some of the documents in this dataset. For example, imagine a repressive government working with private data aggregators to collect information that is used to target families of political opponents who have sought asylum abroad. One mechanism used to try to achieve a balance between the open court principle and privacy is that in publishing the documents in this dataset, the relevant courts and tribunals prohibit search engines from indexing the documents. Users of this data are required to do the same. ### Non-Official Versions Documents included in this dataset are unofficial copies. For official versions published by the Government of Canada, please see the source URLs. ### Non-Affiliation / Endorsement The reproduction of documents in this dataset was not done in affiliation with, or with the endorsement of the Federal Court or the Government of Canada. ## Considerations for Using the Data ### Social Impact of Dataset The Refugee Law Lab recognizes that this dataset -- and further research using the dataset -- raises challenging questions about how to balance protecting privacy, enhancing government transparency, addressing information asymmetries, and building technologies that leverage data to advance the rights and interests of refugees and other displaced people, as well as assisting those working with them (rather than technologies that [enhance the power of states](https://citizenlab.ca/2018/09/bots-at-the-gate-human-rights-analysis-automated-decision-making-in-canadas-immigration-refugee-system/) to control the movement of people across borders). More broadly, the Refugee Law Lab also recognizes that considerations around privacy and data protection are complex and evolving. When working on migration, refugee law, data, technology and surveillance, we strive to foreground intersectional understandings of the systemic harms perpetuated against groups historically made marginalized. We encourage other users to do the same. We also encourage users to try to avoid participating in building technologies that harm refugees and other marginalized groups, as well as to connect with [community organizations](https://www.migrationtechmonitor.com/ways-to-help) working in this space, and to [listen directly](https://www.migrationtechmonitor.com/about-us) and learn from people who are affected by new technologies. We will review the use these datasets periodically to examine whether continuing to publicly release these datasets achieves the Refugee Law Lab's goals of advancing the rights and interests of refugees and other marginalized groups without creating disproportionate risks and harms, including risks related to privacy and human rights. ### Discussion of Biases The dataset reflects many biases present in legal decision-making, including biases based on race, immigration status, gender, sexual orientation, religion, disability, socio-economic class, and other intersecting categories of discrimination. ### Other Known Limitations Due to the ways that all legal datasets may be skewed, users of this dataset are encouraged to collaborate with or consult domain experts. ## Additional Information ### Licensing Information Attribution-NonCommercial 4.0 International ([CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)) NOTE: Users must also comply with [upstream licensing](https://www.fct-cf.gc.ca/en/pages/important-notices) for data obtained from the Federal Court, as well as requests on source urls not to allow indexing of the documents by search engines to protect privacy. As a result, users must not make the data available in formats or locations that can be indexed by search engines. ### Warranties / Representations We make no warranties or representations that the data included in this dataset is complete or accurate. Data were obtained through academic research projects, including projects that use automated processes. While we try to make the data as accurate as possible, our methodologies may result in inaccurate or outdated data. As such, data should be viewed as preliminary information aimed to prompt further research and discussion, rather than as definitive information. ### Dataset Curators [Sean Rehaag](https://www.osgoode.yorku.ca/faculty-and-staff/rehaag-sean), Osgoode Hall Law School Professor & Director of the Refugee Law Lab ### Citation Information Sean Rehaag, "Luck of the Draw III: Code & Data" (2023) online: Github: <https://github.com/Refugee-Law-Lab/luck-of-the-draw-iii>. ### Acknowledgements This project draws on research supported by the Social Sciences and Humanities Research Council, the Law Foundation of Ontario, and the Digital Research Alliance of Canada. Jacob Danovich assisted with the infrastructure and scraping code for this project.

提供机构：

refugee-law-lab

原始信息汇总

数据集概述

数据集总结

该数据集包含加拿大联邦法院自1997年至2022年间所有涉及移民法的在线案卷信息。数据集可用于法律分析（例如识别法律决策中的模式），测试机器学习和自然语言处理工具在加拿大双语法律材料上的应用，以及预训练语言模型用于各种任务。

数据集结构

数据实例

数据集包含一个单一数据实例，即所有在线联邦法院涉及移民法的案卷，这些案卷在2022年11月数据收集时的状态。

数据字段

citation (字符串): 文档的法律引用（如有中性引用）。在本数据集中，法律引用是案卷号，格式为IMM-#-YY，其中IMM表示移民法案卷，#是当年接收申请的顺序号，YY是申请初始提交年份的后两位数字。
year (int32): 文档日期年份，可用于过滤。在本数据集中，年份是申请初始提交的年份。
name (字符串): 文档名称，在本数据集中是法院文件的案由。
date_filed (字符串): 文档日期（yyyy-mm-dd）。在本数据集中，日期是申请提交的日期。
city_filed (字符串): 申请初始提交的城市。
nature (字符串): 联邦法院分配的程序类别。
class (字符串): 联邦法院分配的第二个程序类别。
track (字符串): 联邦法院分配的第三个程序类别。
documents (字典列表): 包含每个案卷条目的字典列表，每个字典包含以下键值对：
- RE_NO: 联邦法院分配给案卷条目的编号。
- DOCNO: 如果条目涉及文档提交，联邦法院分配给该文档的编号。
- DOC_DT: 案卷条目的日期。
- RECORDED_ENTRY: 案卷条目的内容。
source_url (字符串): 文档被抓取的URL，也是官方版本的查找地址。
scraped_timestamp (字符串): 文档被抓取的日期（yyyy-mm-dd）。

数据语言

部分案卷为英语，部分为法语，有些则交替使用英语和法语。

数据分割

数据未进行分割，所有数据均在训练集（train split）中。

数据集创建

数据来源

所有数据均通过联邦法院的网站收集。

个人和敏感信息

文档可能包含个人和敏感信息。所有文档均由联邦法院在线发布。尽管公开法庭原则要求法庭材料向公众开放，但这些材料的广泛和容易获取可能带来隐私风险，尤其是对于边缘化群体，如难民和其他非公民。

非官方版本

数据集中的文档是非官方副本。官方版本请参见源URL。

使用数据集的考虑

数据集的社会影响

该数据集及其进一步研究提出了关于如何平衡保护隐私、增强政府透明度、解决信息不对称以及构建利用数据推进难民和其他流离失所者权利和利益的技术的挑战性问题。

数据集的偏见

数据集反映了法律决策中的许多偏见，包括基于种族、移民身份、性别、性取向、宗教、残疾、社会经济阶层和其他交叉歧视类别的偏见。

其他已知限制

由于所有法律数据集可能存在偏斜，建议用户与领域专家合作或咨询。

附加信息

许可信息

数据集遵循Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)许可。用户还必须遵守从联邦法院获取数据的上游许可，以及源URL上关于禁止搜索引擎索引文档以保护隐私的要求。

保证/声明

我们不保证数据集中的数据是完整或准确的。数据通过学术研究项目获取，包括使用自动化流程的项目。尽管我们尽力使数据尽可能准确，但我们的方法可能导致数据不准确或过时。因此，数据应被视为旨在促进进一步研究和讨论的初步信息，而不是确定性信息。

数据集策展人

Sean Rehaag，Osgoode Hall法学院教授兼难民法实验室主任。

引用信息

Sean Rehaag, "Luck of the Draw III: Code & Data" (2023) online: Github: https://github.com/Refugee-Law-Lab/luck-of-the-draw-iii.

致谢

该项目得到社会科学和人文研究委员会、安大略法律基金会和加拿大数字研究联盟的支持。Jacob Danovich协助了该项目的基础设施和抓取代码。

5,000+

优质数据集

54 个

任务类型

进入经典数据集