A Dataset Showing a Century of Evolution in the Complexity of the United States Legal Code
收藏Figshare2025-07-11 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/A_Century_of_Evolution_in_the_Complexity_of_the_United_States_Legal_Code/29540039
下载链接
链接失效反馈官方服务:
资源简介:
We leverage OCR and Generative AI techniques to recover and clean printed historical editions of the Code. This enables computational analysis of federal law even in periods before web-based digital access. The processing pipeline includes:📄 Contents of U.S. Code: Word counts, unique word counts, entropy, scaling exponents, etc.🌲 Hierarchical Structure: Subtitle → Part → Chapter → Section → Subsection...🔗 Cross-Reference Relationships: Title-to-title citation relationshipsFor the small sample of our data, please check out our github repository https://github.com/Dawoon-Jeong0523/uscode-complexity🔍 A sample OCR text page (ocr_processing_gemini) for demonstration🌐 Web-based U.S. Code text from 1994 for structural parsing (Data Set 2)
创建时间:
2025-07-11



