A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code

Name: A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code
Creator: figshare
Published: 2026-01-06 01:57:33
License: 暂无描述

DataCite Commons2026-01-06 更新2026-04-25 收录

下载链接：

https://figshare.com/articles/dataset/A_Century_of_Evolution_in_the_Complexity_of_the_United_States_Legal_Code/29540039/5

下载链接

链接失效反馈

官方服务：

资源简介：

We leverage OCR and Generative AI techniques to recover and clean printed historical editions of the Code. This enables computational analysis of federal law even in periods before web-based digital access. The processing pipeline includes:📄 Contents of U.S. Code: Word counts, unique word counts, entropy, scaling exponents, etc.🌲 Hierarchical Structure: Subtitle → Part → Chapter → Section → Subsection...🔗 Cross-Reference Relationships: Title-to-title citation relationshipsFor the small sample of our data, please check out our github repository https://github.com/Dawoon-Jeong0523/uscode-complexity🔍 A sample OCR text page (<code>ocr_processing_gemini</code>) for demonstration🌐 Web-based U.S. Code text from 1994 for structural parsing (<code>Data Set 2</code>)

我们借助**光学字符识别(Optical Character Recognition, OCR)**与**生成式AI(Generative AI)**技术，对该法典的印刷版历史版本进行复原与清理。这使得即便在基于网络的数字公开渠道问世之前的历史时期，也可对联邦法律开展计算分析。处理流程涵盖： 📄 **《美国法典(U.S. Code)》文本统计特征**：词频数、唯一词频数、熵值、标度指数等。 🌲 **层级结构**：副标题→分部→章→节→小节…… 🔗 **交叉引用关系**：法典标题间的引用关联。如需查看我们的小型数据样本，请访问我们的GitHub仓库：https://github.com/Dawoon-Jeong0523/uscode-complexity 🔍 用于演示的OCR文本页面示例（<code>ocr_processing_gemini</code>） 🌐 用于结构解析的1994年版网络版《美国法典》文本（<code>数据集2</code>）

提供机构：

figshare

创建时间：

2025-10-22