five

The OpenITI Millionaires

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12774173
下载链接
链接失效反馈
官方服务:
资源简介:
This data set pertains to the largest works in the OpenITI corpus at or prior to 1000 AH and is based on the 2023.1.8 release of the corpus and the corresponding text reuse data between the books in the corpus, which is generated by running using passim on the corpus.  We wanted to understand the extent to which a small number of persons produced a substantial percentage of the OpenITI corpus, on a word-count basis. We call the authors with work(s) over a million words the ‘millionaires’. The data will be analysed in forthcoming publications by the KITAB project team, including a monograph by Sarah Bowen Savant under contract with Edinburgh University Press. KITAB is funded by the European Research Council under the European Union’s Horizon 2020 research and innovation programme, awarded to the KITAB project (Grant Agreement No. 772989, PI Sarah Bowen Savant), hosted at Aga Khan University, London. In addition, it has received funding from the Qatar National Library to aid in the adaptation of the passim algorithm for Arabic. KITAB’s text reuse data is published on Zenodo and each version is the output of a separate run. The version number of each release corresponds to the corpus releases.
创建时间:
2024-07-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作