Natas - Python 3 library for processing historical English

Name: Natas - Python 3 library for processing historical English
Creator: https://b2share.eudat.eu
Published: 2024-03-28 10:47:17
License: 暂无描述

DataCite Commons2024-03-28 更新2025-04-09 收录

下载链接：

https://b2share.eudat.eu/records/2a9a25be10e442e2a75f8c688a1c82c4

下载链接

链接失效反馈

官方服务：

资源简介：

This library will have methods for processing historical English corpora, especially for studying neologisms. The first functionalities to be released relate to normalization of historical spelling and OCR post-correction. ---- Cite If you use the library, please cite one of the following publications depending on whether you used it for normalization or OCR correction. ---- Normalization --- Mika Hämäläinen, Tanja Säily, Jack Rueter, Jörg Tiedemann, and Eetu Mäkelä. 2019. Revisiting NMT for Normalization of Early English Letters. In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature. --- OCR correction --- Mika Hämäläinen, and Simon Hengchen. 2019. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In the Proceedings of Recent Advances in Natural Language Processing.

本库将提供用于处理历史英语语料库的方法，尤其适用于新词研究。首批发布的功能涉及历史拼写规范化与OCR后校正。 ---- 引用若使用本库，请根据您是用于规范化还是OCR校正，引用以下任一出版物。 ---- 规范化 --- Mika Hämäläinen、Tanja Säily、Jack Rueter、Jörg Tiedemann与Eetu Mäkelä. 2019. 《重新审视神经机器翻译（Neural Machine Translation）在早期英语书信规范化中的应用》. 载于《第三届SIGHUM联合研讨会：文化遗产、社会科学、人文与文学计算语言学论文集》. --- OCR校正 --- Mika Hämäläinen与Simon Hengchen. 2019. 《从Paft到Fiiture：一种基于神经机器翻译与词嵌入（Word Embeddings）的全自动OCR后校正方法》. 载于《自然语言处理最新进展论文集》.

提供机构：

https://b2share.eudat.eu

创建时间：

2020-07-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集