CEMA
收藏arXiv2021-04-21 更新2024-06-21 收录
下载链接:
https://lamop.pantheonsorbonne.fr/axes-recherche/livres-textes-langages/cema
下载链接
链接失效反馈官方服务:
资源简介:
CEMA数据集,由LaMOP机构开发,包含约250,000份中世纪外交文件,涵盖了从大约梅罗文加时期到16世纪的广泛时间范围。这些文件主要涉及土地、商品和权利的赠与、销售、交换或确认,多数由私人向教会机构进行。数据集的创建过程涉及从多个现有数据库下载和清洗文档,并通过词形还原和格式化以支持数据和文本挖掘。CEMA数据集的应用领域包括历史语义学、区域化和中世纪社会动态的研究,旨在解决中世纪文本的标准化和区域差异问题。
The CEMA dataset, developed by the LaMOP institution, contains approximately 250,000 medieval diplomatic documents spanning a broad temporal range from the Merovingian period to the 16th century. These documents primarily cover the granting, sale, exchange, or confirmation of land, commodities, and rights, most of which were executed by private parties in favor of ecclesiastical institutions. The development of the CEMA dataset involved downloading and cleansing documents from multiple existing databases, followed by lemmatization and formatting to support data and text mining. Key application fields of the CEMA dataset include research in historical semantics, regionalization, and medieval social dynamics, aiming to address challenges related to standardization and regional disparities in medieval texts.
提供机构:
LaMOP
创建时间:
2021-04-21



