align-anything
收藏魔搭社区2026-05-16 更新2025-02-08 收录
下载链接:
https://modelscope.cn/datasets/PKU-Alignment/align-anything
下载链接
链接失效反馈官方服务:
资源简介:
# Overview: Align-Anything Dataset
<span style="color: red;">A Comprehensive All-Modality Alignment Dataset with Fine-grained Preference Annotations and Language Feedback.</span>
[🏠 Homepage](https://github.com/PKU-Alignment/align-anything) | [🤗 Align-Anything Dataset](https://huggingface.co/datasets/PKU-Alignment/align-anything) | [🤗 T2T_Instruction-tuning Dataset](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K) | [🤗 TI2T_Instruction-tuning Dataset](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-TI2T-Instruction-100K) | [👍 Our Official Code Repo](https://github.com/PKU-Alignment/align-anything)
Our world is inherently multimodal. Humans perceive the world through multiple senses, and **Language Models** should operate similarly. However, the development of **Current Multi-Modality Foundation Models** faces limitations due to the availability and diversity of data across different modalities. Specifically, the challenges include:
1. **Imbalance in modality data**: While there is abundant data for vision tasks, data for other modalities such as video and audio is relatively scarce, and there is a lack of interconnected data across different modalities.
2. **Limited multi-modality training data**: The majority of existing datasets focus on modality-specific question-answer tasks, while there is a lack of specialized datasets to enhance multi-modality models' **Instruction-Following** capabilities.
To address these challenges, we propose **Align-Anything 200K**, which features:
- **All-modality tasks**: Incorporating tasks that cover all major modalities.
- **Fine-grained preference**: Capturing nuanced user preferences across tasks.
- **Language feedback**: Supporting critique and refinement through natural language.
- **Cross-modality QA pairs**: Enabling richer interactions between different modalities.
Please cite the repo if you find the data or code in this repo useful 😊
```bibtex
@inproceedings{ji2024align,
title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback},
author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang},
year={2024},
url={https://arxiv.org/abs/2412.15838}
}
```
## Summary
### Our current open-source datasets
You can click the links in `Modality Type` for more details.
| Modality Type | Dataset Type | Current Open-source Data Volume |
|---------------|----------------------|---------------------------------|
| [Text-to-Text](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-text/README.md) | Preference | 30K |
| [Text-Image-to-Text](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text/README.md) | Preference | 40K |
| [Text-Image-to-Text-Image](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text-image/README.md) | Preference | 27K |
| [Text-to-Image](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-image/README.md) | Preference | 32K |
| [Text-Audio-to-Text](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-audio-to-text/README.md) | Preference | 30K |
| [Text-to-Audio](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-audio/README.md) | Preference | 12K |
| [Text-to-Video](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-video/README.md) | Preference | 9K |
| [Text-Video-to-Text](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-video-to-text/README.md) | Preference | 10K |
| [Text-Image-to-Text-Instruction](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-TI2T-Instruction-100K) | Instruction-Following | 100K |
| [Text-to-Text-Instruction](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K) | Instruction-Following | 100K |
### Prompt Distribution
<div align="center">
<img src="new_distribution.png" width="100%"/>
</div>
### Usage
```python
from datasets import load_dataset
# text-to-text
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-text')['val']
# text-image-to-text
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-image-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-image-to-text')['val']
# text-image-to-text-expert
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-image-to-text-expert')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-image-to-text-expert')['val']
# text-to-image
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-image')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-to-image')['val']
```
```python
# text-audio-to-text
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-audio-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-audio-to-text')['val']
# text-to-audio
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-audio')['train']
```
```
# Due to the specificity of video files, we recommend using the `wget` command to download the video-based dataset directly.
# text-video-to-text:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-video-to-text/train_10k.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-video-to-text/videos.tar.gz
# text-to-video:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/9k_train.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz0
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz1
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz2
cat videos.tar.gz* | tar -xzvf
```
```
# text-image-to-text-image
# Load the ti2ti dataset with the `wget` command:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-image-to-text-image/train_27k.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-image-to-text-image/images.tar.gz
```
```python
# text-image-to-text instruction-following
train_dataset = load_dataset('PKU-Alignment/Align-Anything-TI2T-Instruction-100K',split='train')
# text-to-text instruction-following
dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K",split='train')
```
## 1. Highlights
Unlike existing datasets, which focus on individual modalities and vary in quality, **Align-Anything** offers consistent, high-quality data that encompasses **any modality (e.g., text, image, video and audio) in mixed inputs and outputs**. It provides detailed human preference annotations along with fine-grained language feedback for critique and refinement, enabling comprehensive evaluation and improvement across modalities.
### 1.1 All-Modality Tasks
We present the combination of our **Align-Anything**, divided into three parts:
- **Any-to-Any** represents the bidirectional conversion of any type of input-output modality, such as text, video, audio and images.
- **Any-to-Text** represents the transition from non-textual inputs—such as image, video, and audio—into textual output.
- **Text-to-Any** represents the setting that text inputs are to be converted into any other modalities.
### 1.2 Fine-Grained Preference
How to Define a High-Quality Image? Assessing the quality of rich multimodal data is challenging with binary preferences on individual metrics.
To address this, we have designed **Fine-Grained All-Modality Constitutions** to assist in annotating fine-grained preferences. These constitutions are composed of two main parts:
1. **General fine-grained metrics across modalities**, such as instruction-following, objective rules, clarity & aesthetics, information richness, and safety.
2. **Modality-specific constitutions**: For instance, for the video modality, we designed metrics such as temporal consistency, content coherence, and motion naturalness.
You can explore each modality’s subset dataset to view its fine-grained constitutions and definitions in detail.
According to the **Fine-Grained All-Modality Constitutions**, we utilized **GPT-4o**, **Gemini-1.5-Pro**, and **Human Crowds** to annotate data, resulting in comprehensive fine-grained annotations across all modalities.
### 1.3 Language Feedback
Multimodal data requires fine-grained annotations for better optimization. To guide the optimization process more effectively, **multimodal data** requires more fine-grained annotations. We propose a unified alignment method across all modalities by **utilizing language feedback**. Specifically, we provide critique and refinement feedback on each dimension as well as overall preferences for every data point. This feedback can be incorporated into your training process to enhance the performance of multimodal models.
### 1.4 Cross-Modality QA Pairs
Handling the interactions between different modalities is crucial for **Multimodal Foundation Models**. To address this, we have also labeled **Any-to-Any Cross-Modality** data, which allows for comprehensive interactions across modalities.
This dataset will be available soon...
## 2. Annotation Pipeline
We demonstrate a multi-step process for refining AI responses based on multi-modal prompts. Raw prompts are refined based on specific modality and task, and then used to generate responses from various sources. Finally, we used the closed-source SOTA model and humans to perform cross-modality fine-grained annotation and language feedback to obtain the final dataset.
### 2.1 Collect Q-A Pairs
We start by designing specialized features tailored to various modalities. Based on specific modality tasks and their corresponding feature designs, we design **Fine-Grained All-Modality Constitutions**, according to which we refine the original prompts, which may initially be suboptimal, to create the final versions. We then collect responses from multiple sources, including self-constructed methods, the invocation of open-source and closed-source models, and human-generated answers.
### 2.2 Fine-grained Annotation
We conduct fine-grained preference annotations on the collected question-answer pairs. The annotations are sourced from both GPT-4, Gemini-1.5-Pro and human annotators. This annotation process covers a diverse range of dimensions, such as instruction-following, objective rules, aesthetics, information richness and safety, each with corresponding preferences and scoring criteria.
### 2.3 Language Feedback
Finally, we provide language feedback on the responses. This involves determining the scope of critique, executing the critique, and providing refinement suggestions within the pipeline. This process captures both direct preferences for each modality and language-based feedback, ensuring a comprehensive evaluation and enhancement of the responses.

## 3. Datasets Comparison
> **Note**
> Existing preference datasets are limited in scope and quality, focusing on specific modalities and lacking comprehensive annotations. In contrast, **Align-Anything** offers high-quality data across all modalities, with detailed human preference annotations and language feedback for critique and refinement. This comprehensive approach ensures a consistent evaluation and improvement of responses across modalities.

**Preference Annotation Methods** in the table consist of three parts, namely `Methods (A | S | F)` in the above table.
- **A** refers to the annotation source, which indicates how preferences are determined within the dataset. "Manual" denotes human annotation or manually constructed preferences, "Synthetic" refers to preferences generated or annotated by models like GPT-4V or other systems, and "Combined" refers to datasets aggregated from multiple sources.
- **S** represents the composition of preference signals, which may include scoring, ranking, and reasoning. In some cases, preferences are constructed by refining, correcting, or corrupting responses to form the desired preference pairs.
- **F** indicates whether the dataset provides fine-grained feedback at a more detailed level within those preference dimensions.
**Dimensions** indicate the primary preference challenges the dataset aims to address.
We compare the existing multimodal preference datasets, as shown in the table above. This comparison highlights the feedback diversity in our **Align-Anything**, which addresses the limitations of existing preference datasets, particularly following the expansion into multiple modalities.
## 4. Human Agreement Analysis
We analyze the human agreement on the preference scores and the percentage of agreement on the preference scores. Our results show that the human agreement on the preference scores is high, indicating the reliability of the preference annotations. The percentage of agreement on the preference scores is also high, demonstrating the consistency of the preference annotations.
## 5. Citation
Please cite our work if you use the data or model in your paper.
```
@misc{align_anything,
author = {PKU-Alignment Team},
title = {Align Anything: training all modality models to follow instructions with unified language feedback},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/PKU-Alignment/align-anything}},
}
```
# 概述:Align-Anything 数据集
<span style="color: red;">**一个包含细粒度偏好标注与语言反馈的全模态对齐综合数据集**</span>
[🏠 主页](https://github.com/PKU-Alignment/align-anything) | [🤗 Align-Anything 数据集](https://huggingface.co/datasets/PKU-Alignment/align-anything) | [🤗 T2T_指令微调数据集](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K) | [🤗 TI2T_指令微调数据集](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-TI2T-Instruction-100K) | [👍 官方代码仓库](https://github.com/PKU-Alignment/align-anything)
我们的世界本质上是多模态的。人类通过多种感官感知世界,**语言模型(Language Models)**也应当如此。然而,当前**多模态基础模型(Multi-Modality Foundation Models)**的发展受限于不同模态数据的可用性与多样性,具体面临以下挑战:
1. **模态数据失衡**:尽管视觉任务拥有丰富的数据资源,但视频、音频等其他模态的数据相对匮乏,且缺乏跨模态的互联互通数据。
2. **多模态训练数据不足**:现有多数数据集聚焦于单模态问答任务,缺乏能够增强多模态模型**指令遵循(Instruction-Following)**能力的专用数据集。
为解决上述挑战,我们提出了**Align-Anything 200K**,其具备以下特点:
- **全模态任务**:涵盖覆盖所有主流模态的任务类型。
- **细粒度偏好**:捕捉不同任务中用户的精细化偏好。
- **语言反馈**:支持通过自然语言进行评论与优化。
- **跨模态问答对**:实现不同模态间更丰富的交互。
若您认为本仓库中的数据或代码对您有所帮助,请引用本仓库 😊
bibtex
@inproceedings{ji2024align,
title={Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback},
author={Jiaming Ji and Jiayi Zhou and Hantao Lou and Boyuan Chen and Donghai Hong and Xuyao Wang and Wenqi Chen and Kaile Wang and Rui Pan and Jiahao Li and Mohan Wang and Josef Dai and Tianyi Qiu and Hua Xu and Dong Li and Weipeng Chen and Jun Song and Bo Zheng and Yaodong Yang},
year={2024},
url={https://arxiv.org/abs/2412.15838}
}
## 摘要
### 当前开源数据集
您可点击`模态类型`列中的链接查看更多详情。
| 模态类型 | 数据集类型 | 当前开源数据量 |
|---------------|----------------------|---------------------------------|
| [文本到文本(Text-to-Text)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-text/README.md) | 偏好 | 30K |
| [文本-图像到文本(Text-Image-to-Text)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text/README.md) | 偏好 | 40K |
| [文本-图像到文本-图像(Text-Image-to-Text-Image)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text-image/README.md) | 偏好 | 27K |
| [文本到图像(Text-to-Image)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-image/README.md) | 偏好 | 32K |
| [文本-音频到文本(Text-Audio-to-Text)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-audio-to-text/README.md) | 偏好 | 30K |
| [文本到音频(Text-to-Audio)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-audio/README.md) | 偏好 | 12K |
| [文本到视频(Text-to-Video)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-to-video/README.md) | 偏好 | 9K |
| [文本-视频到文本(Text-Video-to-Text)](https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-video-to-text/README.md) | 偏好 | 10K |
| [文本-图像到文本-指令(Text-Image-to-Text-Instruction)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-TI2T-Instruction-100K) | 指令遵循 | 100K |
| [文本到文本-指令(Text-to-Text-Instruction)](https://huggingface.co/datasets/PKU-Alignment/Align-Anything-Instruction-100K) | 指令遵循 | 100K |
### 提示词分布
<div align="center">
<img src="new_distribution.png" width="100%"/>
</div>
### 使用方法
python
from datasets import load_dataset
# 文本到文本
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-text')['val']
# 文本-图像到文本
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-image-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-image-to-text')['val']
# 文本-图像到文本-专家版
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-image-to-text-expert')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-image-to-text-expert')['val']
# 文本到图像
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-image')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-to-image')['val']
python
# 文本-音频到文本
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-audio-to-text')['train']
val_dataset = load_dataset('PKU-Alignment/align-anything', name='text-audio-to-text')['val']
# 文本到音频
train_dataset = load_dataset('PKU-Alignment/align-anything',name='text-to-audio')['train']
# 由于视频文件的特殊性,我们建议直接使用`wget`命令下载基于视频的数据集。
# 文本-视频到文本:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-video-to-text/train_10k.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-video-to-text/videos.tar.gz
# 文本到视频:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/9k_train.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz0
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz1
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-to-video/videos.tar.gz2
cat videos.tar.gz* | tar -xzvf
# 文本-图像到文本-图像
# 使用`wget`命令加载ti2ti数据集:
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-image-to-text-image/train_27k.json
wget https://huggingface.co/datasets/PKU-Alignment/align-anything/resolve/main/text-image-to-text-image/images.tar.gz
python
# 文本-图像到文本-指令微调数据集
train_dataset = load_dataset('PKU-Alignment/Align-Anything-TI2T-Instruction-100K',split='train')
# 文本到文本-指令微调数据集
dataset = load_dataset("PKU-Alignment/Align-Anything-Instruction-100K",split='train')
## 1. 核心亮点
与现有仅聚焦单一模态且质量参差不齐的数据集不同,**Align-Anything**提供了高质量的标准化数据,覆盖**任意模态(如文本、图像、视频与音频)**的混合输入与输出。它提供了详尽的人类偏好标注与细粒度语言反馈,用于评论与优化,可实现跨模态的全面评估与性能提升。
### 1.1 全模态任务
我们将**Align-Anything**的任务组合划分为三大类:
- **任意模态到任意模态(Any-to-Any)**:支持任意类型输入-输出模态的双向转换,例如文本、视频、音频与图像。
- **任意模态到文本(Any-to-Text)**:将图像、视频、音频等非文本输入转换为文本输出。
- **文本到任意模态(Text-to-Any)**:将文本输入转换为任意其他模态的输出。
### 1.2 细粒度偏好
如何定义高质量图像?仅通过单一指标的二元偏好来评估丰富的多模态数据质量颇具挑战。
为此,我们设计了**细粒度全模态准则(Fine-Grained All-Modality Constitutions)**以辅助细粒度偏好标注。该准则包含两大核心部分:
1. **跨模态通用细粒度指标**:如指令遵循性、客观规则性、清晰度与美观度、信息丰富度以及安全性。
2. **模态专属准则**:例如针对视频模态,我们设计了时间一致性、内容连贯性与运动自然性等指标。
您可浏览各模态的子数据集,以查看其细粒度准则与详细定义。
基于上述**细粒度全模态准则**,我们借助**GPT-4o**、**Gemini-1.5-Pro**与**人类众包**进行数据标注,最终得到覆盖全模态的全面细粒度标注结果。
### 1.3 语言反馈
多模态数据需要细粒度标注以实现更优的优化。为更有效地指导优化流程,**多模态数据**需要更精细的标注。我们提出了一种跨所有模态的统一对齐方法,即**利用语言反馈**。具体而言,我们为每个数据点提供各维度的评论与优化建议,以及整体偏好评分。此类反馈可融入您的训练流程,以提升多模态模型的性能。
### 1.4 跨模态问答对
处理不同模态间的交互对**多模态基础模型**至关重要。为此,我们还标注了**任意模态到任意模态跨模态**数据,可实现跨模态的全面交互。
该数据集即将上线……
## 2. 标注流程
我们展示了基于多模态提示优化AI响应的多步骤流程:首先根据特定模态与任务对原始提示进行优化,再用于生成多来源的响应;最终借助闭源SOTA模型与人类进行跨模态细粒度标注与语言反馈,以得到最终数据集。
### 2.1 问答对收集
我们首先针对各类模态设计专属特征。基于特定模态任务及其对应特征设计,我们制定了**细粒度全模态准则**,据此对初始可能不够完善的原始提示进行优化,得到最终版本的提示。随后我们从多来源收集响应,包括自主构建方法、调用开源与闭源模型生成的结果,以及人类撰写的答案。
### 2.2 细粒度标注
我们对收集到的问答对进行细粒度偏好标注。标注来源包括GPT-4、Gemini-1.5-Pro与人类标注者。该标注流程覆盖多样化维度,如指令遵循性、客观规则性、美观度、信息丰富度与安全性,每个维度均配有对应的偏好与评分标准。
### 2.3 语言反馈
最终,我们为响应提供语言反馈,流程包括确定评论范围、执行评论以及提供优化建议。该流程同时涵盖各模态的直接偏好与基于语言的反馈,确保对响应进行全面评估与优化。

## 3. 数据集对比
> **注意**
> 现有偏好数据集的覆盖范围与质量均存在局限,仅聚焦于特定模态且缺乏全面的标注。与之相比,**Align-Anything**提供了覆盖全模态的高质量数据,附带详尽的人类偏好标注与用于评论优化的语言反馈。这种全面的方案可确保跨模态的一致性评估与响应优化。

**偏好标注方法**在表格中由三部分组成,即上表中的`Methods (A | S | F)`。
- **A** 代表标注来源,即数据集内偏好的确定方式。“人工”表示人类标注或人工构建的偏好,“合成”指由GPT-4V或其他系统生成/标注的偏好,“混合”指从多来源聚合得到的数据集。
- **S** 代表偏好信号的组成形式,可包括评分、排序与推理。在部分场景中,偏好通过优化、修正或篡改响应以形成理想的偏好对来构建。
- **F** 表示数据集是否在上述偏好维度中提供更细致的细粒度反馈。
**维度** 表示该数据集旨在解决的主要偏好挑战。
我们将现有多模态偏好数据集与本数据集进行对比,如上表所示。该对比凸显了**Align-Anything**的反馈多样性,弥补了现有偏好数据集的局限,尤其是在拓展至多模态场景后。
## 4. 人类一致性分析
我们分析了人类对偏好评分的一致性以及偏好评分的一致率。结果显示,人类对偏好评分的一致性较高,表明偏好标注具备可靠性;偏好评分的一致率同样较高,证明了偏好标注的一致性。
## 5. 引用
若您在论文中使用本数据集或相关模型,请引用我们的工作。
bibtex
@misc{align_anything,
author = {PKU-Alignment Team},
title = {Align Anything: training all modality models to follow instructions with unified language feedback},
year={2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {url{https://github.com/PKU-Alignment/align-anything}},
}
提供机构:
maas
创建时间:
2025-02-07



