five

LinaAlhuri/ArabicConceptualCaptions3M

收藏
Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/LinaAlhuri/ArabicConceptualCaptions3M
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - image-to-text language: - ar pretty_name: ArabicConceptualCaptions3M size_categories: - 1M<n<10M --- # Arabic Translated Conceptual Captions Dataset ## Overview This dataset consists of conceptual captions translated into Arabic using the Google Translate API. It serves as a resource for researchers and developers interested in exploring the vision-language tasks and biases introduced during the translation process. ## Dataset Information - **Source Dataset**: Conceptual Captions - **Translation Tool**: Google Translate API - **Translation Language**: English to Arabic ## Important Notes 1. **Translation Quality**: The translations are machine-generated and may contain errors, inaccuracies, or cultural nuances that were not appropriately captured. Researchers are encouraged to verify translations for accuracy. 2. **Biases**: The dataset is prone to various types of biases, including but not limited to gender bias. Google Translate API, like any other machine translation tool, may inadvertently introduce biases present in its training data. 3. **Usage Guidelines**: Please refer to the original Conceptual Captions dataset usage guidelines, as they apply to this translated version. Respect copyright and licensing agreements associated with the source dataset. ## Description the dataset contains two file train and validation and each has: - **`arabic_caption`**: This column includes the machine-translated captions in Arabic, generated using the Google Translate API. - **`caption`**: This column contains the original captions in English sourced from the Conceptual Captions dataset. - **`link`**: This column contains links to images corresponding to the captions in the dataset.
提供机构:
LinaAlhuri
原始信息汇总

Arabic Translated Conceptual Captions Dataset 概述

数据集信息

  • 任务类别:
    • 图像到文本
  • 语言:
    • 阿拉伯语
  • 数据集名称:
    • ArabicConceptualCaptions3M
  • 数据集大小:
    • 1M<n<10M

数据集详情

  • 源数据集:
    • Conceptual Captions
  • 翻译工具:
    • Google Translate API
  • 翻译语言:
    • 英语到阿拉伯语

数据集内容

  • 文件:
    • 训练集和验证集
  • 列信息:
    • arabic_caption: 包含使用Google Translate API生成的阿拉伯语机器翻译标题。
    • caption: 包含从Conceptual Captions数据集获取的原始英语标题。
    • link: 包含与数据集中标题对应的图像链接。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作