five

Language Dataset

收藏
DataCite Commons2026-02-05 更新2024-07-13 收录
下载链接:
https://data.ncl.ac.uk/articles/dataset/Language_Dataset/24574729/1
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset containing the images and labels for the Language data used in the CVPR NAS workshop Unseen-data challenge under the codename "LaMelo"The Language dataset is a constructed dataset using words from aspell dictionaries. The intention of this dataset is to require machine learning models to not only perform image classification but also linguistic analysis to figure out which letter frequency is associated with each language. For each Language image we selected four six-letter words using the standard latin alphabet and removed any words with letters that used diacritics (such as ́e or ̈u) or included ‘y’ or ‘z’.We encode these words on a graph with one axis representing the index of the 24 character long string (the four words joined together) and the other representing the letter (going A-X).The data is in a channels-first format with a shape of (n, 1, 24, 24) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The ten classes and corresponding numerical label are as follows:English: 0,<br>Dutch: 1,<br>German: 2,<br>Spanish: 3,<br>French: 4,<br>Portuguese: 5,<br>Swahili: 6,<br>Zulu: 7,<br>Finnish: 8,<br>Swedish: 9
提供机构:
Newcastle University
创建时间:
2023-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作