five

"Echoes Dataset: Simulating Textual Transmission with Natural Language Processing Techniques"

收藏
DataCite Commons2025-06-04 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/echoes-dataset-simulating-textual-transmission-natural-language-processing-techniques-0
下载链接
链接失效反馈
官方服务:
资源简介:
"Two main objectives in Computational Textual Criticism correspond to the development of algorithms for tree and text reconstruction under conditions of imperfect copying. Despite recent developments in the field, few comparative studies or benchmarks have been performed, particularly in the case of text reconstruction. On the other hand, recent advancements in Natural Language Processing (NLP) have begun to impact various aspects of the humanities and social sciences. In this paper, we incorporate various NLP techniques (including LLMs) to simulate text transmission and benchmark different tree and text reconstruction algorithms. In addition, for text reconstruction, we incorporate LLMs to improve the final result. Our results show that the UPGMA\/NJ method combined with the Levenshtein metric achieved superior comparative results for tree reconstruction. Moreover, for text reconstruction, we found that the Simple Majority Rule (SMR), UR, and RHM methods yielded consistent results, and in most cases, the incorporation of LLMs improved the final output."
提供机构:
IEEE DataPort
创建时间:
2025-06-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作