onurkeles/econ_paper_abstracts
收藏Hugging Face2024-03-25 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/onurkeles/econ_paper_abstracts
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
- question-answering
- text-generation
language:
- en
pretty_name: e
size_categories:
- 1K<n<10K
---
# Dataset Card for Economics Paper Dataset
## Dataset Summary
The Economics Research Paper Dataset was designed to support the development of the LLaMA-2-Econ models, with a focus on Title Generation, Abstract Classification, and Question & Answer (Q&A) tasks. It comprises abstracts and titles of economics research papers, along with synthetic Q&A pairs derived from the abstracts, to facilitate training of large language models for economics-specific applications.
## Dataset Description
**Content:** The dataset includes:
- Economics paper abstracts and titles.
**Source:** The data was collected using the arXiv API, with papers selected from the categories Econometrics (ec.EM), General Economics (ec.GN), and Theoretical Economics (ec.TH).
**Volume:**
- Total abstracts and titles: 6362
## Intended Uses
This dataset is intended for training and evaluating language models specialized in:
- Generating titles for economics research papers.
- Classifying abstracts into sub-fields of economics.
- Answering questions based on economics paper abstracts.
## Dataset Creation
### Curation Rationale
The dataset was curated to address the lack of specialized tools and datasets for enhancing research within the economics domain, leveraging the potential of language models like LLaMA-2.
### Source Data
#### Initial Data Collection and Normalization
Data was collected through the arXiv API, targeting papers within specified categories of economics. Titles and abstracts were extracted, and synthetic Q&A pairs were generated using a process that involved the GPT-3.5 Turbo model for contextual dialogue creation.
### Licensing Information
The dataset is derived from arXiv papers. Users are advised to adhere to arXiv's terms of use.
提供机构:
onurkeles
原始信息汇总
Economics Research Paper Dataset 概述
数据集概要
该数据集名为“Economics Research Paper Dataset”,旨在支持LLaMA-2-Econ模型的开发,专注于经济学研究论文的标题生成、摘要分类和问答任务。数据集包含经济学研究论文的摘要和标题,以及从摘要中衍生的合成问答对,用于训练大型语言模型以应用于经济学领域。
数据集描述
内容:
- 包含经济学论文的摘要和标题。
来源:
- 数据通过arXiv API收集,选取了Econometrics (ec.EM)、General Economics (ec.GN)和Theoretical Economics (ec.TH)类别的论文。
容量:
- 总摘要和标题数量:6362
预期用途
该数据集主要用于:
- 生成经济学研究论文的标题。
- 将摘要分类到经济学的子领域。
- 基于经济学论文摘要回答问题。
数据集创建
筛选理由
数据集的筛选旨在解决经济学领域内缺乏专门工具和数据集的问题,利用LLaMA-2等语言模型的潜力提升研究质量。
原始数据
初始数据收集和标准化
- 数据通过arXiv API收集,特别关注经济学相关类别的论文。
- 提取标题和摘要,并使用GPT-3.5 Turbo模型生成合成的问答对。
许可信息
- 数据集源自arXiv论文,用户需遵守arXiv的使用条款。



