ArtifactAI/arxiv_nlp_intstruct

Name: ArtifactAI/arxiv_nlp_intstruct
Creator: ArtifactAI
Published: 2023-11-28 01:33:23
License: 暂无描述

Hugging Face2023-11-28 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/ArtifactAI/arxiv_nlp_intstruct

下载链接

链接失效反馈

官方服务：

资源简介：

数据集arxiv_nlp_intstruct包含从ArXiv的cs.CL类别摘要中生成的问题-答案对。这些问题和答案是使用GPT-3.5-turbo模型生成的。数据集主要用于文本生成任务，特别是语言建模和掩码语言建模。数据集不包含注释，也没有个人和敏感信息。

提供机构：

ArtifactAI

原始信息汇总

数据集概述

数据集名称

名称: arxiv_nlp_intstruct

数据集描述

概述: 该数据集包含从ArXiv的cs.CL类别摘要中提取的问题-答案对。问题和答案是通过GPT-3.5-turbo模型生成的。

语言

语言: 英语

数据集结构

数据实例

训练集大小: 36,894个问题-答案对
下载数据集大小: 38.4 MB

示例

json { "question": "What is the role of the light-weight cross-attention mechanism in MixEncoder?", "answer": "In the MixEncoder, the lightweight cross-attention mechanism plays a crucial role in capturing cross-modal dependencies between different modalities (such as text and image). The purpose of cross-attention is to align features from one modality with features from another modality, allowing the model to understand the correlation between them. Specifically, the lightweight cross-attention mechanism in MixEncoder enables the model to attend to relevant information in the other modality when processing a specific modality. This helps the model to exploit important semantic relationships between different modalities and improve the overall representation learning process. By incorporating cross-attention, the MixEncoder provides a way for different modalities to interact and co-adapt during training, enabling the model to effectively understand and build representations that capture the underlying relationships between text and image inputs. This boosts the performance of multimodal models in tasks such as image captioning, visual question answering, and multimodal sentiment analysis, among others." }

数据字段

问题: 字符串类型，表示问题。
答案: 字符串类型，表示答案。

数据集来源

来源: 问题-答案对从ArXiv的cs.CL类别摘要中提取，使用GPT-3.5-turbo模型生成。

许可证

许可证: CC0-1.0

任务类型

任务类型: 文本生成
具体任务: 语言建模, 掩码语言建模

数据集大小

大小: 1M<n<10M

数据集创建者

创建者: Matthew Kenney
创建年份: 2023

5,000+

优质数据集

54 个

任务类型

进入经典数据集