recastai/coyo-75k-augmented-captions

Name: recastai/coyo-75k-augmented-captions
Creator: recastai
Published: 2023-08-15 05:55:57
License: 暂无描述

Hugging Face2023-08-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/recastai/coyo-75k-augmented-captions

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - text-generation language: - en pretty_name: coyo-75k-augmented-captions size_categories: - 10K<n<100K --- # Dataset Card for `coyo-75k-augmented-captions` ## Dataset Description ### Dataset Summary This dataset has been created by **Re:cast AI**, and consists of ~7.5K image-caption pairs that were expanded to ~75K pairs using a generative model. Each generated caption is the result of CLIP top-k filtering between N candidate captions and the corresponding image. The dataset is useful for many downstream tasks such as fine-tuning language models, and using further refinement strategies (e.g. RLHF, CLIP guidance, etc.) More information and resulting models to come soon... ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

提供机构：

recastai

原始信息汇总

数据集卡片 for `coyo-75k-augmented-captions`

数据集描述

数据集摘要

该数据集由 Re:cast AI 创建，包含约7.5K个图像-标题对，通过生成模型扩展到约75K对。每个生成的标题是通过在N个候选标题和相应图像之间进行CLIP top-k过滤的结果。

该数据集适用于许多下游任务，如微调语言模型，以及使用进一步的细化策略（例如RLHF，CLIP指导等）。

更多信息和结果模型即将推出...

语言

[需要更多信息]

数据集结构

数据实例

[需要更多信息]

数据字段

[需要更多信息]

数据分割

[需要更多信息]

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

注释

注释过程

[需要更多信息]

注释者是谁？

[需要更多信息]

使用数据的注意事项

数据集的社会影响

[需要更多信息]

偏见的讨论

[需要更多信息]

其他已知限制

[需要更多信息]

附加信息

数据集策展人

[需要更多信息]

许可信息

[需要更多信息]

引用信息

[需要更多信息]

贡献

[需要更多信息]

5,000+

优质数据集

54 个

任务类型

进入经典数据集

recastai/coyo-75k-augmented-captions

数据集卡片 for coyo-75k-augmented-captions

数据集描述

数据集摘要

语言

数据集结构

数据实例

数据字段

数据分割

数据集创建

策划理由

源数据

初始数据收集和规范化

注释

注释过程

注释者是谁？

使用数据的注意事项

数据集的社会影响

偏见的讨论

其他已知限制

附加信息

数据集策展人

许可信息

引用信息

贡献

数据集卡片 for `coyo-75k-augmented-captions`