Replication Data for: The semantic structuring of minimizing constructions in present-day Netherlandic Dutch: a distribution-based cluster analysis
收藏DataONE2025-09-02 更新2025-09-06 收录
下载链接:
https://search.dataone.org/view/sha256:0d94f9887a4707a4e3ac6d4c46fc326e9eb7a40e7df80f9bc0ecad02c02f7f0f
下载链接
链接失效反馈官方服务:
资源简介:
Dataset abstract: This dataset contains the data files that were used for the cluster analysis of the Dutch minimizing construction, as described in the publication cited below. In addition to a ReadMe file, it contains three files: A txt file is provided with the corpus queries that were used to find tokens of the minimizing constructions in the Dutch Web 2014 (nlTenTen14) corpus, available via Sketch Engine (more information about the TenTen corpora: Jakubíček, M., A. Kilgarriff, V. Kovář, P. Rychlý & V. Suchomel (2013). The TenTen corpus family. In: 7th International Corpus Linguistics Conference CL. Lancaster, 125–127). A csv file is provided that forms the input file for the cluster analysis. It contains a list of 5,863 minimizer-predicate combinations, more specifically a list of the predicates that are combined with the minimizers that have a token frequency of at least 10 in my dataset. An R-script is provided with the code to perform the cluster analysis in R.
Article abstract: This paper examines the semantic structuring of a paradigm of 89 minimizers, i.e., nouns that reinforce sentential negation in present-day Netherlandic Dutch, such as meter ‘meter’ in voor geen meter vertrouwen ‘not to trust for a meter’. Cosine distances are computed on the basis of the predicates the minimizers combine with in a sample of 100 tokens downloaded from the Dutch Web corpus 2014 (nlTenTen14) and clustered according to the Partitioning Around Medoids (PAM) algorithm into nine semantic clusters. The clusters largely correspond to semantic categories such as taboo terms or units of money. This suggests that, in general, minimizers belonging to the same semantic domain are combined with a similar (core) set of predicates. Based on the shared predicates per cluster, we detect signs of analogical attraction between minimizers or, conversely, competition. Crucially, low silhouette widths enable us to identify outliers in their respective clusters, for instance, minimizing nouns that exhibit signs of context expansion, as shown by their combination with semantically non-harmonious verbs. As such, this paper provides a synchronic snapshot of the semantic processes involved in (incipient) grammaticalization of minimizing nouns and, more in general, it illustrates how distributional semantics offers a heuristic to analyze the structure of a network of comparable micro-constructions.
创建时间:
2025-09-03



