Echoes Dataset: Simulating Textual Transmission with Natural Language Processing Techniques
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/echoes-dataset-simulating-textual-transmission-natural-language-processing-techniques-0
下载链接
链接失效反馈官方服务:
资源简介:
Two main objectives in Computational Textual Criticism correspond to the development of algorithms for tree and text reconstruction under conditions of imperfect copying. Despite recent developments in the field, few comparative studies or benchmarks have been performed, particularly in the case of text reconstruction. On the other hand, recent advancements in Natural Language Processing (NLP) have begun to impact various aspects of the humanities and social sciences. In this paper, we incorporate various NLP techniques (including LLMs) to simulate text transmission and benchmark different tree and text reconstruction algorithms. In addition, for text reconstruction, we incorporate LLMs to improve the final result. Our results show that the UPGMA\/NJ method combined with the Levenshtein metric achieved superior comparative results for tree reconstruction. Moreover, for text reconstruction, we found that the Simple Majority Rule (SMR), UR, and RHM methods yielded consistent results, and in most cases, the incorporation of LLMs improved the final output.
提供机构:
Fernando Aguilar Canto; Hiram Calvo



