five

impact-es diachronic corpus of historical Spanish

收藏
arXiv2013-06-28 更新2024-06-21 收录
下载链接:
http://www.digitisation.eu/tools/language-resources/impact-es/
下载链接
链接失效反馈
官方服务:
资源简介:
impact-es是一个开放的历史西班牙语历时语料库,包含约800万字,涵盖了1481年至1748年间出版的107本书籍,涉及多种文体和作者。该数据集由阿利坎特大学的研究团队创建,旨在通过提供丰富的历史文本资源,支持语言学研究。数据集分为两部分:gt部分包含21个文档,bvc部分包含86个文档,均来自Biblioteca Virtual Miguel de Cervantes数字图书馆。此外,数据集还附带一个链接超过10,000个词条的词典。该数据集的应用领域包括语言演变研究、文本现代化的自动规则推导等,旨在解决历史文本处理中的挑战,如拼写变异和文本解读。

impact-es is an open diachronic historical Spanish corpus containing approximately 8 million words, covering 107 books published between 1481 and 1748 across diverse genres and authors. Developed by a research team at the University of Alicante, this corpus aims to support linguistic research by providing abundant historical textual resources. It is divided into two subsets: the gt subset with 21 documents and the bvc subset with 86 documents, both sourced from the Biblioteca Virtual Miguel de Cervantes digital library. Additionally, the corpus is accompanied by a lexicon containing over 10,000 entries. Its application areas include language evolution research, automatic rule derivation for text modernization and other related fields, and it is designed to address challenges in historical text processing such as spelling variation and textual interpretation.
提供机构:
阿利坎特大学语言与信息系统系
创建时间:
2013-06-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作