five

TamilCOCO Dataset

收藏
DataCite Commons2025-01-09 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/tamilcoco-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
TamilCOCO is a novel bilingual image captioning dataset specifically designed for Tamil, a low-resource language. This dataset facilitates research in image captioning, cross-lingual natural language processing, and culturally adapted AI applications.Dataset StatisticsTotal Rows: 305,340Total Columns: 3Unique Images: 63,062Unique English Captions: 303,036Unique Tamil Captions: N/A (translations are not unique due to possible repetitions)Column Descriptionsimage_id: Unique identifier for each image in the dataset. Represents the visual content associated with the captions.caption_english: The original English caption describing the image.raw_caption_tamil: The corresponding Tamil caption, translated and culturally adapted for relevance.FeaturesLanguage Pair: English-TamilData Type: Textual descriptions (image captions)Multilingual Support: Bilingual captions in English and Tamil, enabling cross-lingual applications.Cultural Adaptation: Tamil captions incorporate idiomatic expressions and culturally specific terms for enhanced relevance.
提供机构:
IEEE DataPort
创建时间:
2025-01-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作