dreamproit/bill_text_us
收藏Hugging Face2023-10-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/dreamproit/bill_text_us
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- text-generation
- text-classification
language:
- en
tags:
- legal
- bills
pretty_name: bill_text_us
size_categories:
- 100K<n<1M
---
# Dataset Card for "bill_text_us"
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [BillML](https://github.com/dreamproit/BillML)
- **Repository:** [BillML](https://github.com/dreamproit/BillML)
- **Paper:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Leaderboard:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
- **Point of Contact:** [More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Dataset Summary
Dataset for US Congressional bills (bill_text_us).
### Supported Tasks and Leaderboards
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Languages
English
## Dataset Structure
### Data Instances
#### default
### Data Fields
- id: id of the bill in format(congress number + bill type + bill number + bill version).
- congress: number of the congress.
- bill_type: type of the bill.
- bill_number: number of the bill.
- bill_version: version of the bill.
- title: official title of the bill.
- sections: list of bill sections with section_id, text and header.
- sections_length: number with lenght of the sections list.
- text: bill text.
- text_length: number of characters in the text.
### Data Splits
train
## Dataset Creation
### Curation Rationale
Bills (proposed laws) are specialized, structured documents with great public significance.
Often, the language of a bill may not directly explain the potential impact of the legislation.
This dataset collects the text of bills and some metadata.
As a result, this dataset collects bill text; it also provides text as a list of sections with the text and header.
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Source Data
[govinfo.gov](https://www.govinfo.gov/)
#### Initial Data Collection and Normalization
The data consists of the US congress bills that were collected from the [govinfo.gov](https://www.govinfo.gov/) service provided by the United States Government Publishing Office (GPO) under CC0-1.0 license.
#### Who are the source language producers?
[More Information Needed](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[dreamproit.com](https://dreamproit.com/)
### Licensing Information
Bill and summary information are public and are unlicensed, as it is data produced by government entities. The collection and enhancement work that we provide for this dataset, to the degree it may be covered by copyright, is released under [CC0](https://creativecommons.org/share-your-work/public-domain/cc0/).
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@aih](https://github.com/aih) [@BorodaUA](https://github.com/BorodaUA), [@alexbojko](https://github.com/alexbojko) for adding this dataset.
提供机构:
dreamproit
原始信息汇总
数据集卡片 for "bill_text_us"
数据集描述
数据集摘要
用于美国国会法案的数据集(bill_text_us)。
支持的任务和排行榜
语言
英语
数据集结构
数据实例
默认
数据字段
- id: 法案的ID,格式为(国会编号 + 法案类型 + 法案编号 + 法案版本)。
- congress: 国会编号。
- bill_type: 法案类型。
- bill_number: 法案编号。
- bill_version: 法案版本。
- title: 法案的正式标题。
- sections: 法案部分的列表,包含section_id、文本和标题。
- sections_length: 法案部分列表的长度。
- text: 法案文本。
- text_length: 法案文本的字符数。
数据分割
训练集
数据集创建
策划理由
法案(提出的法律)是具有重大公共意义的专门结构化文件。通常,法案的语言可能不会直接解释立法的潜在影响。该数据集收集法案文本和一些元数据。因此,该数据集收集法案文本;它还提供文本作为带有文本和标题的部分列表。
源数据
初始数据收集和规范化
数据包括从美国政府出版局(GPO)提供的govinfo.gov服务收集的美国国会法案,使用CC0-1.0许可证。
源语言生产者是谁?
注释
注释过程
[更多信息需要]
注释者是谁?
[更多信息需要]
个人和敏感信息
[更多信息需要]
使用数据的注意事项
数据集的社会影响
[更多信息需要]
偏见的讨论
[更多信息需要]
其他已知限制
[更多信息需要]
附加信息
数据集策展人
许可信息
法案和摘要信息是公开的,未经许可,因为这是政府实体生产的数据。我们为这个数据集提供的收集和增强工作,如果可能受版权保护,则根据CC0发布。
引用信息
[更多信息需要]
贡献
感谢@aih @BorodaUA, @alexbojko 添加此数据集。



