recastai/coyo-75k-augmented-captions
收藏Hugging Face2023-08-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/recastai/coyo-75k-augmented-captions
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
language:
- en
pretty_name: coyo-75k-augmented-captions
size_categories:
- 10K<n<100K
---
# Dataset Card for `coyo-75k-augmented-captions`
## Dataset Description
### Dataset Summary
This dataset has been created by **Re:cast AI**, and consists of ~7.5K image-caption pairs that were expanded to ~75K pairs using a generative model.
Each generated caption is the result of CLIP top-k filtering between N candidate captions and the corresponding image.
The dataset is useful for many downstream tasks such as fine-tuning language models, and using further refinement strategies (e.g. RLHF, CLIP guidance, etc.)
More information and resulting models to come soon...
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
[More Information Needed]
提供机构:
recastai
原始信息汇总
数据集卡片 for coyo-75k-augmented-captions
数据集描述
数据集摘要
该数据集由 Re:cast AI 创建,包含约7.5K个图像-标题对,通过生成模型扩展到约75K对。每个生成的标题是通过在N个候选标题和相应图像之间进行CLIP top-k过滤的结果。
该数据集适用于许多下游任务,如微调语言模型,以及使用进一步的细化策略(例如RLHF,CLIP指导等)。
更多信息和结果模型即将推出...
语言
[需要更多信息]
数据集结构
数据实例
[需要更多信息]
数据字段
[需要更多信息]
数据分割
[需要更多信息]
数据集创建
策划理由
[需要更多信息]
源数据
初始数据收集和规范化
[需要更多信息]
注释
注释过程
[需要更多信息]
注释者是谁?
[需要更多信息]
使用数据的注意事项
数据集的社会影响
[需要更多信息]
偏见的讨论
[需要更多信息]
其他已知限制
[需要更多信息]
附加信息
数据集策展人
[需要更多信息]
许可信息
[需要更多信息]
引用信息
[需要更多信息]
贡献
[需要更多信息]



