TamilCOCO Dataset
收藏DataCite Commons2025-01-09 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/tamilcoco-dataset
下载链接
链接失效反馈官方服务:
资源简介:
TamilCOCO is a novel bilingual image captioning dataset specifically designed for Tamil, a low-resource language. This dataset facilitates research in image captioning, cross-lingual natural language processing, and culturally adapted AI applications.Dataset StatisticsTotal Rows: 305,340Total Columns: 3Unique Images: 63,062Unique English Captions: 303,036Unique Tamil Captions: N/A (translations are not unique due to possible repetitions)Column Descriptionsimage_id: Unique identifier for each image in the dataset. Represents the visual content associated with the captions.caption_english: The original English caption describing the image.raw_caption_tamil: The corresponding Tamil caption, translated and culturally adapted for relevance.FeaturesLanguage Pair: English-TamilData Type: Textual descriptions (image captions)Multilingual Support: Bilingual captions in English and Tamil, enabling cross-lingual applications.Cultural Adaptation: Tamil captions incorporate idiomatic expressions and culturally specific terms for enhanced relevance.
提供机构:
IEEE DataPort
创建时间:
2025-01-09



