five

Replication data and code for: Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology

收藏
Figshare2024-04-26 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Replication_data_and_code_for_Mapping_when_-clauses_in_Latin_American_and_Caribbean_languages_an_experiment_in_subtoken-based_typology/25431814
下载链接
链接失效反馈
官方服务:
资源简介:
This repository contains the outputs, and the scripts and data to replicate the experiment presented in:Pedrazzini, Nilo. Forthcoming. Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology (To appear in Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024). Association for Computational LinguisticsMore specifically:1) Code.zip contains:- when-latamecar.csv: the dataset with all occurrences of English when in the New Testament and their parallels in the Latin American and Caribbean languages from Mayer & Cysouw's (2014) massively parallel corpus;- when-latamecar-withgrams.csv: the same dataset after n-gram search (i.e. with certain values substituted with its n-gram group label, e.g. ngram_1, ngram_2, etc.);- the scripts (or notebooks) to produce when-latamecar-withgrams.csv from when-latamecar.csv and to generate semantic maps on the basis of it, as described in the paper.- See also the README.md inside it2) krigingmaps.zip: all the probabilistic semantic maps generated from the parallel dataset. Only a subset of these were presented in the paper.3) when-ngram-details.txt: a breakdown, by ISO code, of which forms the n-gram groups for each language actually contain.
创建时间:
2024-04-26
二维码
社区交流群
二维码
科研交流群
商业服务