five

Digging into Early Colonial Mexico: DECM Machine Ready Corpus, 1577-1585

收藏
CESSDA2025-06-12 更新2024-08-03 收录
下载链接:
https://datacatalogue.cessda.eu/detail?lang=en&q=a54764aae50e9395460b05702a6f54e1d48258eb807713af3925c589326e960a
下载链接
链接失效反馈
官方服务:
资源简介:
This digital version of the RGs corpus contains only the historical information produced in the 16th century. All the comments and footnotes by René Acuña and Mercedes de la Garza have been removed to provide a clean version of the transcribed documents. This version of the corpus is now ready to be used for Text Mining, Machine Learning, Natural Language Processing, Corpus Lingüistics, and any other computational methodologies available for the study and exploration of historical textual sources. The Data Collection is available from an external repository. Access is available via Related Resources.<p>The 'Colonisation of America' is a fundamental process in the history of the modern world. Along with archaeological remains, the historical writings related to the establishment of the so-called Virreinatos constitute primary sources of information for the understanding of this period. An extended compilation of information ordered by the Spanish crown in the 16th century, called Relaciones Geogr&aacute;ficas, served to gather vast amounts of information about the New World through multiple records and descriptions, both in Spanish and indigenous style. Traditional research of these documents has relied on the close reading of a handful of these texts, which can take the scholar a life-time to examine. Using a Big-Data approach, this project will apply for the first time ground-breaking computational methodologies to study one of the most important sources for the colonial history of America, and it will identify, extract, cross-link, and analyse information of vital importance to historical enquiry. Our highly interdisciplinary team will combine techniques from different disciplines, including Corpus Linguistics, Text Mining, Natural Language Processing, Machine Learning, and Geographic Information Systems, to address questions related to the recording of information about indigenous cultures, the Spanish exploration of indigenous social and religious concepts, the appropriation and ideas about place and space in the indigenous world, and their attitudes towards politics and economy. In doing so, the project will transform the way historical sources and large corpora are approached and analysed by modern scholars.</p>

本RGs语料库(RGs Corpus)的数字化版本仅收录16世纪产生的历史史料。为提供转录文档的纯净版本,已移除勒内·阿库尼亚(René Acuña)与梅塞德斯·德拉加尔萨(Mercedes de la Garza)所撰写的全部注释与脚注。该语料库版本现已就绪,可用于文本挖掘(Text Mining)、机器学习(Machine Learning)、自然语言处理(Natural Language Processing)、语料库语言学(Corpus Linguistics)以及其他一切可用于历史文本史料研究与探索的计算方法论。 本数据集可通过外部存储库获取,访问方式详见相关资源。 “美洲殖民化”是现代世界历史中的核心进程之一。与考古遗存一道,与所谓总督辖区(Virreinatos)建立相关的历史文献,是今人理解该时期的一手史料。16世纪西班牙王室下令编纂的大型史料汇编《地理记述(Relaciones Geográficas)》,通过大量西班牙语与原住民风格的记录与描述,收集了关于新大陆的海量信息。过往对这些文献的传统研究,仅依靠学者对少量文本的细读,这类工作往往耗费学者毕生精力。本项目将首次采用开创性的计算方法论,依托大数据(Big Data)路径研究美洲殖民史最重要的史料之一,并将识别、提取、交叉关联并分析对历史研究至关重要的信息。我们的跨学科团队将融合来自多个学科的技术,包括语料库语言学、文本挖掘、自然语言处理、机器学习以及地理信息系统(Geographic Information Systems),以解答一系列核心问题:原住民文化的信息记录、西班牙人对原住民社会与宗教观念的探索、原住民世界中对地域与空间的占有与认知,以及他们对政治与经济的态度。通过上述工作,本项目将革新现代学者研究与分析历史史料及大型语料库的范式。
提供机构:
UK Data Service
创建时间:
2022-11-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作