five

theoracle/commodore64

收藏
Hugging Face2024-03-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/theoracle/commodore64
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 size_categories: - 1K<n<10K task_categories: - table-question-answering - text-generation pretty_name: Commodore 64 Dataset based on Commodore 64 Programmer's Reference Guide dataset_info: features: - name: chunks dtype: string - name: summary dtype: string - name: question dtype: string splits: - name: train num_bytes: 1511321 num_examples: 1832 download_size: 719434 dataset_size: 1511321 configs: - config_name: default data_files: - split: train path: data/train-* tags: - commodore64 - c64 - 8bitcomputers --- # Dataset Card for Commodore 64 Dataset ## Dataset Details ### Dataset Description - **Curated by:** [Curator's Name or Institution] - **Language(s) (NLP):** English - **License:** Apache-2.0 This dataset is derived from the "Commodore 64 Programmer's Reference Guide," encompassing text chunks from the book, their summarized versions, and questions derived from the summaries. It aims to facilitate research and development in natural language processing tasks such as text summarization, question generation, and question answering, particularly in the context of programming and computer science historical texts. ### Dataset Sources - **Repository:** [Link to the dataset repository on Hugging Face] ## Uses ### Direct Use This dataset is intended for: - Training and evaluating text summarization models. - Developing and testing question answering systems. - Generating questions from technical text summaries. ### Out-of-Scope Use This dataset might not be suitable for: - Tasks requiring modern computing concepts not covered in the Commodore 64 reference guide. - Non-English language NLP tasks. ## Dataset Structure ### Features The dataset comprises three main features: - `chunks`: Text excerpts from the "Commodore 64 Programmer's Reference Guide." - `summary`: Summarized versions of the chunks. - `question`: Questions formulated based on the summaries. ### Splits Currently, the dataset includes only a `train` split with 1,832 examples. ## Dataset Creation ### Curation Rationale The dataset was created to provide a unique resource for exploring NLP tasks related to technical text processing, summarization, and question generation/answering, leveraging the historical and technical significance of the Commodore 64 programming domain. ### Source Data #### Data Collection and Processing The data was extracted from the "Commodore 64 Programmer's Reference Guide," summarized, and used to formulate questions. [Details on the processing methods, tools used, or any transformations applied should be added here.] ## Bias, Risks, and Limitations The dataset is based on a specific historical computing context and may contain biases inherent to the original text. Users should be aware of its historical nature and the potential for outdated or context-specific information that may not generalize well to modern computing contexts. ## Citation **APA:** [Citation in APA format] **BibTeX:** [Citation in BibTeX format] ## Dataset Card Authors - [Author Name or Institution] ## Dataset Card Contact - [Contact Information]
提供机构:
theoracle
原始信息汇总

数据集卡片 - Commodore 64 数据集

数据集详情

数据集描述

  • 语言(NLP): 英语
  • 许可证: Apache-2.0

该数据集源自“Commodore 64 程序员参考指南”,包含来自该书的文本片段、它们的摘要版本以及基于摘要提出的问题。旨在促进自然语言处理任务的研究和开发,如文本摘要、问题生成和问答,特别是在编程和计算机科学历史文本的背景下。

数据集结构

特征

数据集包含三个主要特征:

  • chunks: 来自“Commodore 64 程序员参考指南”的文本摘录。
  • summary: 片段的摘要版本。
  • question: 基于摘要提出的问题。

分割

目前,数据集仅包含一个 train 分割,包含 1,832 个样本。

数据集创建

策划理由

该数据集旨在提供一个独特的资源,用于探索与技术文本处理、摘要和问题生成/回答相关的 NLP 任务,利用 Commodore 64 编程领域的历史和技术意义。

源数据

数据收集和处理

数据从“Commodore 64 程序员参考指南”中提取,进行摘要处理,并用于制定问题。

偏差、风险和限制

该数据集基于特定的历史计算背景,可能包含与原始文本相关的固有偏差。用户应意识到其历史性质以及可能过时或特定于上下文的信息,这些信息可能无法很好地推广到现代计算上下文。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作