Language Dataset

Name: Language Dataset
Creator: Newcastle University
Published: 2026-02-05 14:58:12
License: 暂无描述

DataCite Commons2026-02-05 更新2024-07-13 收录

下载链接：

https://data.ncl.ac.uk/articles/dataset/Language_Dataset/24574729/1

下载链接

链接失效反馈

官方服务：

资源简介：

Dataset containing the images and labels for the Language data used in the CVPR NAS workshop Unseen-data challenge under the codename "LaMelo"The Language dataset is a constructed dataset using words from aspell dictionaries. The intention of this dataset is to require machine learning models to not only perform image classification but also linguistic analysis to figure out which letter frequency is associated with each language. For each Language image we selected four six-letter words using the standard latin alphabet and removed any words with letters that used diacritics (such as ́e or ̈u) or included ‘y’ or ‘z’.We encode these words on a graph with one axis representing the index of the 24 character long string (the four words joined together) and the other representing the letter (going A-X).The data is in a channels-first format with a shape of (n, 1, 24, 24) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The ten classes and corresponding numerical label are as follows:English: 0, Dutch: 1, German: 2, Spanish: 3, French: 4, Portuguese: 5, Swahili: 6, Zulu: 7, Finnish: 8, Swedish: 9

提供机构：

Newcastle University

创建时间：

2023-11-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集