Replication Data for: A BLAST-based, Language-agnostic Text Reuse Algorithm with a MARKUS Implementation and Sequence Alignment Optimized for Large Chinese Corpora

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://doi.org/10.7910/DVN/2YYJ2B

下载链接

链接失效反馈

官方服务：

资源简介：

Code and sample corpus used for this article, which introduces a BLAST-based text reuse algorithm optimized for Chinese corpora. The code in this repository is under active development. The code assumes you are using the Anaconda distribution of Python 3.6 or later, and have installed the python-Levenshtein library. The sample corpus comes from Christian Wittern's Kanseki repository, which is used under the CC-BY-SA 4.0 license (Included in the corpus.zip file). It contains material from the "histories (史部)" section. The algorithm itself has been incorporated into the MARKUS online research platform.

创建时间：

2019-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集