The OpenITI Millionaires
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12774173
下载链接
链接失效反馈官方服务:
资源简介:
This data set pertains to the largest works in the OpenITI corpus at or prior to 1000 AH and is based on the 2023.1.8 release of the corpus and the corresponding text reuse data between the books in the corpus, which is generated by running using passim on the corpus.
We wanted to understand the extent to which a small number of persons produced a substantial percentage of the OpenITI corpus, on a word-count basis. We call the authors with work(s) over a million words the ‘millionaires’.
The data will be analysed in forthcoming publications by the KITAB project team, including a monograph by Sarah Bowen Savant under contract with Edinburgh University Press. KITAB is funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme, awarded to the KITAB project (Grant Agreement No. 772989, PI Sarah Bowen Savant), hosted at Aga Khan University, London. In addition, it has received funding from the Qatar National Library to aid in the adaptation of the passim algorithm for Arabic.
KITAB’s text reuse data is published on Zenodo and each version is the output of a separate run. The version number of each release corresponds to the corpus releases.
创建时间:
2024-07-25



