Replication Data for: A BLAST-based, Language-agnostic Text Reuse Algorithm with a MARKUS Implementation and Sequence Alignment Optimized for Large Chinese Corpora
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://doi.org/10.7910/DVN/2YYJ2B
下载链接
链接失效反馈官方服务:
资源简介:
Code and sample corpus used for this article, which introduces a BLAST-based text reuse algorithm optimized for Chinese corpora. The code in this repository is under active development. The code assumes you are using the Anaconda distribution of Python 3.6 or later, and have installed the python-Levenshtein library. The sample corpus comes from Christian Wittern's Kanseki repository, which is used under the CC-BY-SA 4.0 license (Included in the corpus.zip file). It contains material from the "histories (史部)" section. The algorithm itself has been incorporated into the MARKUS online research platform.
创建时间:
2019-03-19



