"A BLAST-based, Language-agnostic Text Reuse Algorithm" data
收藏SSH Open MarketPlace2021-07-22 更新2024-08-03 收录
下载链接:
https://marketplace.sshopencloud.eu/dataset/soZi7a
下载链接
链接失效反馈官方服务:
资源简介:
Code and sample corpus used for this article, which introduces a BLAST-based text reuse algorithm optimized for Chinese corpora. The code in this repository isÊunder active development. The code assumes you are using the Anaconda distribution of Python 3.6 or later, and have installed the python-Levenshtein library. The sample corpus comes fromÊChristian Wittern's Kanseki repository, which is used under the CC-BY-SA 4.0 license (Included in the corpus.zip file). It contains material from the "histories (__)" section. The algorithm itself has been incorporated into theÊMARKUS online research platform.
创建时间:
2021-07-22



