Replication data and code for: Mapping ‘when’-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology
收藏Figshare2024-04-26 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/Replication_data_and_code_for_Mapping_when_-clauses_in_Latin_American_and_Caribbean_languages_an_experiment_in_subtoken-based_typology/25431814/1
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains the outputs, and the scripts and data to replicate the experiment presented in:<br><br>Pedrazzini, Nilo. Forthcoming.<i> </i><b>Mapping 'when'-clauses in Latin American and Caribbean languages: an experiment in subtoken-based typology</b><i> </i>(To appear in <i>Proceedings of the Workshop on Natural Language Processing for Indigenous Languages of the Americas (AmericasNLP 2024)</i>. Association for Computational Linguistics<br><br>More specifically:<br>1) <b>Code.zip</b> contains:<br>- <b>when-latamecar.csv</b>: the dataset with all occurrences of English <i>when </i>in the New Testament and their parallels in the Latin American and Caribbean languages from Mayer & Cysouw's (2014) massively parallel corpus;<br>- <b>when-latamecar-withgrams.csv</b>: the same dataset after n-gram search (i.e. with certain values substituted with its n-gram group label, e.g. ngram_1, ngram_2, etc.);<br>- the scripts (or notebooks) to produce when-latamecar-withgrams.csv from when-latamecar.csv and to generate semantic maps on the basis of it, as described in the paper.<br>- See also the README.md inside it<br>2) <b>krigingmaps.zip</b>: all the probabilistic semantic maps generated from the parallel dataset. Only a subset of these were presented in the paper.<br>3) <b>when-ngram-details.txt</b>: a breakdown, by ISO code, of which forms the n-gram groups for each language actually contain.
提供机构:
Pedrazzini, Nilo
创建时间:
2024-04-26



