Data from: On the physical origin of linguistic laws and lognormality in speech
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.4ss043q
下载链接
链接失效反馈官方服务:
资源简介:
Physical manifestations of linguistic units include sources of variability
due to factors of speech production which are by definition excluded from
counts of linguistic symbols. In this work we examine whether linguistic
laws hold with respect to the physical manifestations of linguistic units
in spoken English. The data we analyze comes from a phonetically
transcribed database of acoustic recordings of spontaneous speech known as
the Buckeye Speech corpus. First, we verify with unprecedented accuracy
that acoustically transcribed durations of linguistic units at several
scales comply with a lognormal distribution, and we quantitatively justify
this ‘lognormality law’ using a stochastic generative model. Second, we
explore the four classical linguistic laws (Zipf’s law, Herdan’s law,
Brevity law, and Menzerath-Altmann’s law) in oral communication, both in
physical units and in symbolic units measured in the speech
transcriptions, and find that the validity of these laws is typically
stronger when using physical units than in their symbolic counterpart.
Additional results include (i) coining a Herdan’s law in physical units,
(ii) a precise mathematical formulation of Brevity law, which we show to
be connected to optimal compression principles in information theory and
allows to formulate and validate yet another law which we call the
size-rank law, or (ii) a mathematical derivation of Menzerath-Altmann’s
law which also highlights an additional regime where the law is inverted.
Altogether, these results support the hypothesis that statistical laws in
language have a physical origin.
提供机构:
Dryad
创建时间:
2019-07-30



