five

edarchimbaud/news-stocks

收藏
Hugging Face2023-11-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/edarchimbaud/news-stocks
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: symbol dtype: string - name: body dtype: string - name: publisher dtype: string - name: publish_time dtype: timestamp[ns, tz=GMT] - name: title dtype: string - name: url dtype: string - name: uuid dtype: string splits: - name: train num_bytes: 112563283 num_examples: 22025 download_size: 55028670 dataset_size: 112563283 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "news-sp500" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://edarchimbaud.substack.com - **Repository:** https://github.com/edarchimbaud - **Point of Contact:** contact@edarchimbaud.com ### Dataset Summary The news-sp500 dataset provides news articles related to companies in the S&P 500 index. ### Supported Tasks and Leaderboards The dataset can be used for various natural language processing tasks such as text classification, sentiment analysis, information extraction, etc. It does not have a specific leaderboard associated with it. ### Languages The dataset contains news articles in multiple languages. ## Dataset Structure ### Data Instances The dataset consists of [1563] data instances. ### Data Fields - symbol (string): A string representing the ticker symbol or abbreviation used to identify the company. - body (string): The main content of the news article. - publisher (string): The name of the publisher or news agency. - publish_time (timestamp[ns, tz=GMT]): A timestamp indicating the publication time of the news article in GMT timezone. - title (string): The title or headline of the news article. - url (string): The URL or link to the original news article. - uuid (string): A unique identifier for the news article. ### Data Splits The dataset consists of a single split called train. ## Dataset Creation ### Curation Rationale The news-sp500 dataset was created to provide a collection of news articles related to companies in the S&P 500 index for research and analysis purposes. ### Source Data #### Initial Data Collection and Normalization The data was collected from various online news sources and normalized for consistency. ### Annotations #### Annotation process [N/A] #### Who are the annotators? [N/A] ### Personal and Sensitive Information [N/A] ## Considerations for Using the Data ### Social Impact of Dataset [N/A] ### Discussion of Biases [N/A] ### Other Known Limitations [N/A] ## Additional Information ### Dataset Curators The news-sp500 dataset was collected by https://edarchimbaud.substack.com. ### Licensing Information The news-sp500 dataset is licensed under the MIT License. ### Citation Information > https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud ### Contributions Thanks to [@edarchimbaud](https://github.com/edarchimbaud) for adding this dataset.
提供机构:
edarchimbaud
原始信息汇总

数据集概述

数据集名称

  • 名称: news-sp500

数据集内容

  • 描述: 该数据集包含与S&P 500指数中的公司相关的新闻文章。

数据集用途

  • 任务: 适用于文本分类、情感分析、信息提取等自然语言处理任务。
  • 无特定领袖板

语言

  • 包含语言: 多种语言。

数据集结构

数据实例

  • 实例数量: 1563个。

数据字段

  • symbol: 字符串,代表公司的股票代码或缩写。
  • body: 字符串,新闻文章的主要内容。
  • publisher: 字符串,出版商或新闻机构的名称。
  • publish_time: 时间戳(纳秒,时区为GMT),新闻文章的发布时间。
  • title: 字符串,新闻文章的标题或头条。
  • url: 字符串,原始新闻文章的链接。
  • uuid: 字符串,新闻文章的唯一标识符。

数据分割

  • 分割: 仅包含训练集(train)。
  • 训练集大小: 22025个示例,总大小为112563283字节。

数据集创建

数据收集与规范化

  • 来源: 数据从多个在线新闻源收集并规范化。

许可证信息

  • 许可证: MIT许可证。

引用信息

  • 引用格式: https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作