Replication data and code for: Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology

Name: Replication data and code for: Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology
Creator: Pedrazzini, Nilo
Published: 2024-04-26 00:00:00
License: 暂无描述

Figshare2024-04-26 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/Replication_data_and_code_for_Mapping_when_-clauses_in_Latin_American_and_Caribbean_languages_an_experiment_in_subtoken-based_typology/25431814/1

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains the outputs, and the scripts and data to replicate the experiment presented in: Pedrazzini, Nilo. Forthcoming. Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology (To appear in Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024). Association for Computational Linguistics More specifically: 1) Code.zip contains: - when-latamecar.csv: the dataset with all occurrences of English when in the New Testament and their parallels in the Latin American and Caribbean languages from Mayer & Cysouw's (2014) massively parallel corpus; - when-latamecar-withgrams.csv: the same dataset after n-gram search (i.e. with certain values substituted with its n-gram group label, e.g. ngram_1, ngram_2, etc.); - the scripts (or notebooks) to produce when-latamecar-withgrams.csv from when-latamecar.csv and to generate semantic maps on the basis of it, as described in the paper. - See also the README.md inside it 2) krigingmaps.zip: all the probabilistic semantic maps generated from the parallel dataset. Only a subset of these were presented in the paper. 3) when-ngram-details.txt: a breakdown, by ISO code, of which forms the n-gram groups for each language actually contain.

提供机构：

Pedrazzini, Nilo

创建时间：

2024-04-26