The raw n-grams dataset for Rajeg et al.’s (2022) “The Spatial Construal of TIME in Indonesian: Evidence from Language and Gesture”

Name: The raw n-grams dataset for Rajeg et al.’s (2022) “The Spatial Construal of TIME in Indonesian: Evidence from Language and Gesture”
Creator: figshare
Published: 2024-09-30 23:07:23
License: 暂无描述

DataCite Commons2024-09-30 更新2024-11-06 收录

下载链接：

https://figshare.com/articles/dataset/The_raw_n-grams_dataset_for_Rajeg_et_al_s_2022_The_Spatial_Construal_of_TIME_in_Indonesian_Evidence_from_Language_and_Gesture_/27138921

下载链接

链接失效反馈

官方服务：

资源简介：

How to citeRajeg, Gede Primahadi Wijaya (2024). The raw n-grams dataset for Rajeg et al.’s (2022) “The Spatial Construal of TIME in Indonesian: Evidence from Language and Gesture”. figshare. Dataset. https://doi.org/10.6084/m9.figshare.27138921OverviewA dataset of non-tabulated (raw) n-grams (from 2-grams up to 5-grams) derived from a corpus file in the Indonesian Leipzig Corpora Collection (ILCC), that is the “ind_newscrawl_2016_1M-sentences.txt”, the latest addition to the ILCC when the project associated with the generation of these n-grams was started in 2018. These large datasets were generated using R via one of Monash University’s high-performance computing facilities, MonARCH. The datasets became the basis for the linguistic analyses in the following publication:Rajeg, Gede Primahadi Wijaya, Poppy Siahaan & Alice Gaby. 2022. The Spatial Construal of TIME in Indonesian: Evidence from Language and Gesture. Linguistik Indonesia 40(1). 1–24. https://doi.org/10.26499/li.v40i1.297.This repository also includes the R scripts used to create the n-grams. The key R package to produce the n-gram (including the corpus tokenisation) is quanteda (Benoit et al. 2018), supported by the suit of R packages from the tidyverse (Wickham et al. 2019), tidytext (Silge & Robinson 2017), and corplingr (Rajeg 2021). Line 60 onwards in the file <code>R-script-ngram-creation-2-4-grams.R</code> shows how to search/filter and tabulate the n-gram frequency for a given time noun (i.e., tahun ‘year’ in the example).ReferencesSilge, J., & Robinson, D. (2017). Text mining with R: A tidy approach (First edition). O’Reilly.Benoit et al., (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774, https://doi.org/10.21105/joss.00774 https://quanteda.io.Wickham et al., (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686Rajeg, G. P. W. (2021). corplingr: Tidy concordances, collocates, and wordlist. Open Science Framework (OSF). https://doi.org/10.17605/OSF.IO/X8CW4 https://github.com/gederajeg/corplingr/.

提供机构：

figshare

创建时间：

2024-09-30