five

The database of English words in Croatian.xlsx

收藏
DataCite Commons2022-06-07 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/The_database_of_English_words_in_Croatian_xlsx/20014364
下载链接
链接失效反馈
官方服务:
资源简介:
To build a dataset to train and test the model, 60,000 words were manually labelled according to language membership by three independent evaluators. N-gram feature representation was used in combination with a linear Support Vector Machine classification algorithm (SVM) (Smola & Schölkopf, 2004) to extract English words from the ENGRI corpus (Bogunović & Kučić, 2021; Kučić, 2021). An F1 score of 0.9669 was achieved on the test set. The database contains 9,453 English words as well as their absolute and relative frequencies.
提供机构:
figshare
创建时间:
2022-06-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作