five

RevolutionCrossroads/nara_revolutionary_war_pension_files

收藏
Hugging Face2025-12-23 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/RevolutionCrossroads/nara_revolutionary_war_pension_files
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集源自美国国家档案和记录管理局系列《基于美国独立战争服务的养老金和土地赏金申请案件文件,约1800年至约1912年》(NARA目录系列,NAID 300022)。数据集包含页面级记录,包括数字化图像、提取的文本以及可用的人工转录内容。它为研究美国独立战争后几十年间退伍军人及其家庭的生活提供了独特的视角,并为研究、机器学习、家谱学和公共历史项目奠定了基础。数据集包括220万页面级记录,描述了养老金和土地赏金案件文件的数字化页面,以及从国家档案目录导出的元数据字段,转换为Parquet格式以便分析。所有记录均包含提取的文本,约27%的文件有人工转录内容。该数据集旨在支持文化遗产与人工智能交叉领域的研究和实验,为研究独立战争时期的个人和社会历史、测试手写识别和转录方法以及开发大规模文本分析、可视化和发现工具提供了结构化语料库。

A dataset derived from the *Case Files of Pension and Bounty-Land Warrant Applications Based on American Revolutionary War Service, ca. 1800–ca. 1912* (NARA Catalog Series, NAID 300022). This dataset includes page-level records with digitized images, extracted text, and human-created transcriptions where available. It offers a unique window into the lives of veterans and their families in the decades following the American Revolutionary War and provides a foundation for research, machine learning, genealogy, and public history projects. The dataset includes 2.2 million page-level records describing digitized pages from pension and bounty-land warrant case files, metadata fields exported from the National Archives Catalog and converted to Parquet format for analysis. Extracted text is present for all records, and human-created transcriptions are available for approximately 27 percent of files. This dataset was prepared to support research and experimentation at the intersection of cultural heritage and artificial intelligence. It provides a structured corpus for examining the personal and social history of the Revolutionary era, testing methods for handwriting recognition and transcription, and developing tools for large-scale text analysis, visualization, and discovery.
提供机构:
RevolutionCrossroads
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作