Replication data for: Slangs go online, or the rise and fall of the Olbanian language

Name: Replication data for: Slangs go online, or the rise and fall of the Olbanian language
Creator: DataverseNO
Published: 2023-09-28 00:00:00
License: 暂无描述

doi.org2023-09-28 更新2025-01-15 收录

下载链接：

https://doi.org/10.18710/2NKJPG

下载链接

链接失效反馈

官方服务：

资源简介：

All the data were taken from the website udaff.com (the center of the padonki culture and one of the cradles of the Olbanian language), from the section kreativy ('creative stories') where users upload their own short stories. This is one of the oldest and most important sections on the website, and its name is a symbol of padonki culture. It was chosen as the largest and most diachronically representative collection of texts a) with a large number of erratic spellings; b) written by people who identify themselves as padonki, i.e."native speakers" of Olbanian. Texts were selected from 975 webpages covering the time period from January 2001 to December 2011. One text was selected randomly from each page (each page contained 50 texts), and a random fragment of 100 words was extracted for analysis. If a text was for some reason not suitable for analysis (e.g. it was shorter than 100 words), another random text was selected. This resulted in 975 100-word fragments produced by 729 authors (156 authors produced more than one text, the largest number of texts per author was nine, the mean was 1.34). No adjustment was made for the fact that some authors had more than one fragment included in the sample: while this gives their idiolect additional chances to contribute to the observed variation, that must mirror the actual situation. For every word, it was noted how many deviations from the norm it contained. All kinds of deviations were counted, and not all of them are strictly Olbanian. However, the analysis of distribution of deviations a cross different types shows that the number of indisputably non-Olbanian deviations is relatively small and constant and does not distort the general picture.

所有数据均源自于网站udaff.com（垫登基文化的中心之一，以及奥尔巴尼亚语文化的摇篮），该网站设有“创意故事”板块（kreativy），用户在此上传自己的短篇小说。该板块是网站中最古老且最重要的部分之一，其名称也是垫登基文化的象征。该数据集被选为包含大量不规则拼写的文本的最大且最具历时代表性的文本集合，其中作者自我认定为垫登基人，即奥尔巴尼亚语的‘母语者’。从涵盖2001年1月至2011年12月时间段的975个网页中选取文本，每个页面（包含50个文本）随机选取一篇文本进行分析。若因某些原因文本不适宜分析（例如，文本长度不足100词），则重新随机选取另一篇文本。因此，产生了由729位作者创作的975个100词片段（其中156位作者创作了多篇文本，最多的一位作者创作了九篇，平均每位作者创作了1.34篇）。对于样本中包含多位作者超过一个片段的事实，并未进行任何调整：尽管这增加了他们的个人语域对观察到的变异性做出贡献的机会，但这必须反映实际情况。对于每个单词，均记录了其与规范之间的偏差数量。所有类型的偏差都被计算在内，且并非所有偏差都属于严格的奥尔巴尼亚语。然而，对不同类型偏差分布的分析表明，无可争议的非奥尔巴尼亚语偏差的数量相对较小且稳定，并不会扭曲整体图像。

提供机构：

DataverseNO