The database of English words in Croatian.xlsx
收藏DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/The_database_of_English_words_in_Croatian_xlsx/20014364/1
下载链接
链接失效反馈官方服务:
资源简介:
To build a dataset to train and test the model, 60,000 words were manually labelled according to language membership by three independent evaluators. N-gram feature representation was used in combination with a linear Support Vector Machine classification algorithm (SVM) (Smola & Schölkopf, 2004) to extract English words from the ENGRI corpus (Bogunović & Kučić, 2021; Kučić, 2021). An F1 score of 0.9669 was achieved on the test set. The database contains 9,453 English words as well as their absolute and relative frequencies.
提供机构:
figshare
创建时间:
2022-06-07



