five

Data for: Language Models, Surprisal and Fantasy in Slavic Intercomprehension

收藏
Mendeley Data2024-06-25 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/ygsyczp8vr
下载链接
链接失效反馈
官方服务:
资源简介:
The file webresults_cloze_publication.xlsx contains two types of data: a) transcripts of think-aloud protocols and b) respones collected in a web-based intercomprehension experiment for the same stimuli respectively. Part a) Three Polish stimuli sentences were presented to pairs of Czech native speakers in an experimental setting where both participants saw the stimulus sentence on their computer screens. Placed in different rooms, they were asked to communicate over skype and work together in order to come up with a good Czech translation of the sentence. Hence, the experiment output are audio recordings of the two participants trying to decode the stimuli and the written translations they have entered during the experiment. The transcripts are in sheet 1, 3, and 5 of the .xlsx file. Part b) Czech readers (n=23) were asked to translate certain words or phrases within Polish sentences (those that turned out problematic in part a) into Czech in a web-based translation experiment in cloze task design over the website http://intercomprehension.coli.uni-saarland.de/en/. The responses of part b) and corresponding sociodemographic data are in sheet 2, 4, and 6 of the .xlsx file. The responses were checked manually for correctness. Responses with typos were counted as correct, for the main interest was to find out if respondents had understood the stimuli. The column "Total Time Spent (ms)" is the time respondents have spent on entering their response into the gaps in the cloze test until pressing enter. The file surprisal_scores_CS_LM.txt contains surprisal scores obtained from a statistical trigram language model with Kneser-Ney smoothing trained on a Czech corpus (Czech part of InterCorp merged with the Czech part of the Russian National Corpus, size: 175,190 words).
创建时间:
2024-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作