onurkeles/econ_paper_abstracts

Name: onurkeles/econ_paper_abstracts
Creator: onurkeles
Published: 2024-03-25 10:39:03
License: 暂无描述

Hugging Face2024-03-25 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/onurkeles/econ_paper_abstracts

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-classification - question-answering - text-generation language: - en pretty_name: e size_categories: - 1K<n<10K --- # Dataset Card for Economics Paper Dataset ## Dataset Summary The Economics Research Paper Dataset was designed to support the development of the LLaMA-2-Econ models, with a focus on Title Generation, Abstract Classification, and Question & Answer (Q&A) tasks. It comprises abstracts and titles of economics research papers, along with synthetic Q&A pairs derived from the abstracts, to facilitate training of large language models for economics-specific applications. ## Dataset Description **Content:** The dataset includes: - Economics paper abstracts and titles. **Source:** The data was collected using the arXiv API, with papers selected from the categories Econometrics (ec.EM), General Economics (ec.GN), and Theoretical Economics (ec.TH). **Volume:** - Total abstracts and titles: 6362 ## Intended Uses This dataset is intended for training and evaluating language models specialized in: - Generating titles for economics research papers. - Classifying abstracts into sub-fields of economics. - Answering questions based on economics paper abstracts. ## Dataset Creation ### Curation Rationale The dataset was curated to address the lack of specialized tools and datasets for enhancing research within the economics domain, leveraging the potential of language models like LLaMA-2. ### Source Data #### Initial Data Collection and Normalization Data was collected through the arXiv API, targeting papers within specified categories of economics. Titles and abstracts were extracted, and synthetic Q&A pairs were generated using a process that involved the GPT-3.5 Turbo model for contextual dialogue creation. ### Licensing Information The dataset is derived from arXiv papers. Users are advised to adhere to arXiv's terms of use.

提供机构：

onurkeles

原始信息汇总

Economics Research Paper Dataset 概述

数据集概要

该数据集名为“Economics Research Paper Dataset”，旨在支持LLaMA-2-Econ模型的开发，专注于经济学研究论文的标题生成、摘要分类和问答任务。数据集包含经济学研究论文的摘要和标题，以及从摘要中衍生的合成问答对，用于训练大型语言模型以应用于经济学领域。

数据集描述

内容：

包含经济学论文的摘要和标题。

来源：

数据通过arXiv API收集，选取了Econometrics (ec.EM)、General Economics (ec.GN)和Theoretical Economics (ec.TH)类别的论文。

容量：

总摘要和标题数量：6362

预期用途

该数据集主要用于：

生成经济学研究论文的标题。
将摘要分类到经济学的子领域。
基于经济学论文摘要回答问题。

数据集创建

筛选理由

数据集的筛选旨在解决经济学领域内缺乏专门工具和数据集的问题，利用LLaMA-2等语言模型的潜力提升研究质量。

原始数据

初始数据收集和标准化

数据通过arXiv API收集，特别关注经济学相关类别的论文。
提取标题和摘要，并使用GPT-3.5 Turbo模型生成合成的问答对。

许可信息

数据集源自arXiv论文，用户需遵守arXiv的使用条款。

5,000+

优质数据集

54 个

任务类型

进入经典数据集