five

S1 Text - Rank Diversity of Languages: Generic Behavior in Computational Linguistics

收藏
Figshare2015-12-03 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/_Rank_Diversity_of_Languages_Generic_Behavior_in_Computational_Linguistics_/1369726
下载链接
链接失效反馈
官方服务:
资源简介:
Figure S1. Rank distributions of words according to frequency. [a]: Normalized word frequency fR as a function of the rank k for several languages for books published in the year 2000. The color code for languages is as follows: light blue for French, green for German, yellow for Italian, orange for English, dark blue for Spanish, and red for Russian. [b]: Word frequency fR as a function of the rank k for English and several years, normalized so that the most frequent element has relative frequency one. In the inset, the unnormalized frequency f is shown. Figure S2. Comparison between the different models, Equations S1–S5, and the frequency of rank distribution. We use the data for the year 2000 and all languages under consideration. The logarithm base 10 of the ratio of the observed values and the model is plotted. It can be appreciated that different models fit better in different regions. However there is no model that fits all languages and all regions much better than the others. Figure S3. Rank variations in time of twenty words from three different scales for English. Figure S4. Rank variations in time of twenty words from three different scales for German. Figure S5. Rank variations in time of twenty words from three different scales for French. Figure S6. Rank variations in time of twenty words from three different scales for Italian. Figure S7. Rank variations in time of twenty words from three different scales for Spanish. Figure S8. Rank variations in time of twenty words from three different scales for Russian. Figure S9. Rank variations in time of twenty words from three different scales for our simulated language. Figure S10. Distribution of relative flights for all languages studied. A similar plot as the one presented in Fig. 5 is shown for other languages. The same color coding and details are used. Figure S11. Correlations for relative frequency changes for different languages. Black line shows correlations for simulated language. (PDF)
创建时间:
2015-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作