ACL word segmentation correction

Name: ACL word segmentation correction
Creator: heiDATA
Published: 2025-01-28 12:49:07
License: 暂无描述

DataCite Commons2025-01-28 更新2025-04-17 收录

下载链接：

https://heidata.uni-heidelberg.de/citation?persistentId=doi:10.11588/DATA/VK99LU

下载链接

链接失效反馈

官方服务：

资源简介：

The data in this collection consists of two parallel directories, one ("raw") containing the raw text of 18850 articles from the ACL 2013/02 collection, the other ("re-segmented") the word-resegmented version of these articles, obtained using nematus, a seq2seq neural model used for machine translation. The motivation for the work was that spurious spaces in the text seemed to be very common, particularly in older papers, obtained by OCR-ing scanned papers.

提供机构：

heiDATA

创建时间：

2019-07-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集