datadreamer-dev/abstracts_and_tweets
收藏Hugging Face2024-02-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/datadreamer-dev/abstracts_and_tweets
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: abstracts
dtype: string
- name: prompts
dtype: string
- name: tweets
dtype: string
splits:
- name: train
num_bytes: 3127163
num_examples: 900
- name: validation
num_bytes: 343839
num_examples: 100
download_size: 1765300
dataset_size: 3471002
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
library_name: datadreamer
size_categories:
- 1K<n<10K
tags:
- datadreamer
- datadreamer-0.1.0
- synthetic
- gpt-4
- gpt-4
---
# Dataset Card
This a synthetic dataset of arXiv-style research paper abstracts and tweets summarizing them used as a demonstration of the [DataDreamer 🤖💤 library](https://datadreamer.dev/docs/latest/). It was used to train an ["Abstract to Tweet" model](https://huggingface.co/datadreamer-dev/abstracts_to_tweet_model).
---
This dataset was produced with [DataDreamer 🤖💤](https://datadreamer.dev). The synthetic dataset card can be found [here](datadreamer.json).
提供机构:
datadreamer-dev
原始信息汇总
数据集信息
特征
- abstracts: 数据类型为字符串
- prompts: 数据类型为字符串
- tweets: 数据类型为字符串
数据分割
- train:
- 字节数: 3127163
- 样本数: 900
- validation:
- 字节数: 343839
- 样本数: 100
数据大小
- 下载大小: 1765300 字节
- 数据集大小: 3471002 字节
配置
- config_name: default
- data_files:
- train: data/train-*
- validation: data/validation-*
- data_files:
库名称
- library_name: datadreamer
大小分类
- 1K < n < 10K
标签
- datadreamer
- datadreamer-0.1.0
- synthetic
- gpt-4



