A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code
收藏DataCite Commons2026-01-06 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/A_Century_of_Evolution_in_the_Complexity_of_the_United_States_Legal_Code/29540039
下载链接
链接失效反馈官方服务:
资源简介:
We leverage <b>OCR</b> and <b>Generative AI</b> techniques to recover and clean printed historical editions of the Code. This enables computational analysis of federal law even in periods before web-based digital access. The processing pipeline includes:📄 <b>Contents of U.S. Code</b>: Word counts, unique word counts, entropy, scaling exponents, etc.🌲 <b>Hierarchical Structure</b>: Subtitle → Part → Chapter → Section → Subsection...🔗 <b>Cross-Reference Relationships</b>: Title-to-title citation relationshipsFor the small sample of our data, please check out our github repository https://github.com/Dawoon-Jeong0523/uscode-complexity🔍 A sample OCR text page (<code>ocr_processing_gemini</code>) for demonstration🌐 Web-based U.S. Code text from 1994 for structural parsing (<code>Data Set 2</code>)<br>
提供机构:
figshare
创建时间:
2025-07-11



