five

The database of English words and their Croatian equivalents

收藏
Figshare2022-11-04 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/The_database_of_English_words_and_their_Croatian_equivalents/20014712/2
下载链接
链接失效反馈
官方服务:
资源简介:
The current database contains English words which appear in Croatian in their original, unadapted form (e.g. show, boxer, zombie, skin, etc.). The list of words is based on The Database of English words in Croatian (Bogunović & Kučić 2022; https://repository.pfri.uniri.hr/islandora/object/pfri:2495), and was further complemented with words obtained from the corpus hrWaC (Ljubešić & Erjavec 2011; Ljubešić & Klubička 2014) using the platform SketchEngine (Kilgarriff et al. 2004). The same platform was used to check the list of English words against the corpora ENGRI (Bogunović et al. 2021; Bogunović & Kučić 2021) i hrWaC by consulting concordances and using CQL. The tagger Xf was used to filter out all English sentences embedded in Croatian texts. Corpus results were then manually checked using the random sample and filter tools to remove e.g., proper nouns, false cognate, false pairs, etc. The database also lists Croatian equivalents (and corresponding frequencies in the corpora) for each English word if they exist in Croatian. The choice of the Croatian equivalent depended greatly on the available corpus data on word frequency as well as Croatian online dictionaries. Furthermore, single-word and multi-word English expressions are represented separately in the database for reasons of visual transparency and simplification of word search. We would like to stress that the database by no means represents a final product and is not a definite representation of data on English words in Croatian, but is, however, representative of their current status in the Croatian language. Further efforts will be made to update the database and incorporate new data.
提供机构:
Bogunović, Irena; Jelčić Čolakovac, Jasmina; Borucinsky, Mirjana
创建时间:
2022-11-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作