five

onurkeles/econ_paper_abstracts

收藏
Hugging Face2024-03-25 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/onurkeles/econ_paper_abstracts
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification - question-answering - text-generation language: - en pretty_name: e size_categories: - 1K<n<10K --- # Dataset Card for Economics Paper Dataset ## Dataset Summary The Economics Research Paper Dataset was designed to support the development of the LLaMA-2-Econ models, with a focus on Title Generation, Abstract Classification, and Question & Answer (Q&A) tasks. It comprises abstracts and titles of economics research papers, along with synthetic Q&A pairs derived from the abstracts, to facilitate training of large language models for economics-specific applications. ## Dataset Description **Content:** The dataset includes: - Economics paper abstracts and titles. **Source:** The data was collected using the arXiv API, with papers selected from the categories Econometrics (ec.EM), General Economics (ec.GN), and Theoretical Economics (ec.TH). **Volume:** - Total abstracts and titles: 6362 ## Intended Uses This dataset is intended for training and evaluating language models specialized in: - Generating titles for economics research papers. - Classifying abstracts into sub-fields of economics. - Answering questions based on economics paper abstracts. ## Dataset Creation ### Curation Rationale The dataset was curated to address the lack of specialized tools and datasets for enhancing research within the economics domain, leveraging the potential of language models like LLaMA-2. ### Source Data #### Initial Data Collection and Normalization Data was collected through the arXiv API, targeting papers within specified categories of economics. Titles and abstracts were extracted, and synthetic Q&A pairs were generated using a process that involved the GPT-3.5 Turbo model for contextual dialogue creation. ### Licensing Information The dataset is derived from arXiv papers. Users are advised to adhere to arXiv's terms of use.
提供机构:
onurkeles
原始信息汇总

Economics Research Paper Dataset 概述

数据集概要

该数据集名为“Economics Research Paper Dataset”,旨在支持LLaMA-2-Econ模型的开发,专注于经济学研究论文的标题生成、摘要分类和问答任务。数据集包含经济学研究论文的摘要和标题,以及从摘要中衍生的合成问答对,用于训练大型语言模型以应用于经济学领域。

数据集描述

内容:

  • 包含经济学论文的摘要和标题。

来源:

  • 数据通过arXiv API收集,选取了Econometrics (ec.EM)、General Economics (ec.GN)和Theoretical Economics (ec.TH)类别的论文。

容量:

  • 总摘要和标题数量:6362

预期用途

该数据集主要用于:

  • 生成经济学研究论文的标题。
  • 将摘要分类到经济学的子领域。
  • 基于经济学论文摘要回答问题。

数据集创建

筛选理由

数据集的筛选旨在解决经济学领域内缺乏专门工具和数据集的问题,利用LLaMA-2等语言模型的潜力提升研究质量。

原始数据

初始数据收集和标准化

  • 数据通过arXiv API收集,特别关注经济学相关类别的论文。
  • 提取标题和摘要,并使用GPT-3.5 Turbo模型生成合成的问答对。

许可信息

  • 数据集源自arXiv论文,用户需遵守arXiv的使用条款。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作