MVPCorpus|自然语言生成数据集|多任务学习数据集
收藏MVP数据集概述
数据集基本信息
- 名称:MVP (Multi-task Supervised Pre-training for Natural Language Generation)
- 架构:标准Transformer编码器-解码器结构
- 类型:监督预训练自然语言生成模型
- 特色:包含任务特定软提示(prompt)设计
支持任务与对应数据集
文本摘要
- CNN/Daily Mail (cnndm)
- XSum (xsum)
- SAMSum (samsum)
- WLE (wle)
开放式对话系统
- PersonaChat (pc)
- DailyDialog (dd)
- DSTC7-AVSD (da)
- SGD (sgd)
数据到文本生成
- WebNLG v2.1 (webnlg)
- WebNLG v3.0 (webnlg2)
- WikiBio (wikibio)
- E2E (e2e)
- DART (dart)
- ToTTo (totto)
问题生成
- SQuAD (squadqg)
- CoQA (coqaqg)
故事生成
- ROCStories (roc)
- WritingPrompts (wp)
问答系统
- SQuAD (squad)
- CoQA (coqa)
任务导向对话系统
- MultiWOZ 2.0 (multiwoz)
常识生成
- CommonGen (cg)
文本简化
- WikiAuto + Turk/ASSET (wia)
释义生成
- Quora (quora)
文本风格转换
- GYAFC-E&M (gyafc_em)
- GYAFC-F&R (gyafc_fr)
模型获取方式
- 基础模型:
RUCAIBox/mvp
- 任务特定提示模型:
RUCAIBox/mvp-[task_name]
- 多任务预训练变体:
RUCAIBox/mvp-multi-task
相关资源
- 论文地址:https://arxiv.org/abs/2206.12131
- 模型仓库:https://huggingface.co/models?filter=mvp
- 数据集下载:https://huggingface.co/RUCAIBox

YOLO-dataset
该数据集用于训练YOLO模型,包括分类、检测和姿态识别模型。目前支持v8版本,未来计划支持更多版本。
github 收录
The MaizeGDB
The MaizeGDB(Maize Genetics and Genomics Database)是一个专门为玉米(Zea mays)基因组学研究提供数据和工具的在线资源。该数据库包含了玉米的基因组序列、基因注释、遗传图谱、突变体信息、表达数据、以及与玉米相关的文献和研究工具。MaizeGDB旨在支持玉米遗传学和基因组学的研究,为科学家提供了一个集成的平台来访问和分析玉米的遗传和基因组数据。
www.maizegdb.org 收录
ImageNet-A
The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.
Papers with Code 收录
MNIST
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Papers with Code 收录
WorldClim
WorldClim is a website that contains a database of high spatial resolution global weather and climate data. This data can be used for mapping and spatial modeling. The data is provided for use in research and related activities. The website contains three types of data. First, ""historical climate data (WorldClim version 2.1)"" contains 19 “bioclimatic” variables related to temperature, precipitation, solar radiation, wind speed, and water vapor pressure. These data are available for 1970-2000 period at a spatial scale of ~1 km2 (30 seconds) gridded area. These data are constructed from multiple data sources. Second, the “Historical monthly weather data” contains historical monthly weather data for 1960-2018. These data are downscaled from CRU-TS-4.06 by the Climatic Research Unit, University of East Anglia, using WorldClim 2.1 for bias correction. The variables available are average minimum temperature (°C), average maximum temperature (°C) and total precipitation (mm). The lowest spatial resolution at which the data is available is 2.5 minutes (~21 km2 at the equator). Third, “Future climate data” contains CMIP6 downscaled future climate projections. The downscaling and calibration (bias correction) was done with WorldClim v2.1 as baseline climate. Monthly values of minimum temperature, maximum temperature, and precipitation were processed for 23 global climate models (GCMs), and for four Shared Socio-economic Pathways (SSPs): 126, 245, 370 and 585. The monthly values were averages over 20 year periods (2021-2040, 241-2060, 2061-2080, 2081-2100). The lowest spatial resolutions at which the data is available is 30 seconds.
DataCite Commons 收录