edarchimbaud/news-stocks

Name: edarchimbaud/news-stocks
Creator: edarchimbaud
Published: 2023-11-21 05:06:42
License: 暂无描述

Hugging Face2023-11-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/edarchimbaud/news-stocks

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: symbol dtype: string - name: body dtype: string - name: publisher dtype: string - name: publish_time dtype: timestamp[ns, tz=GMT] - name: title dtype: string - name: url dtype: string - name: uuid dtype: string splits: - name: train num_bytes: 112563283 num_examples: 22025 download_size: 55028670 dataset_size: 112563283 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card for "news-sp500" ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://edarchimbaud.substack.com - **Repository:** https://github.com/edarchimbaud - **Point of Contact:** contact@edarchimbaud.com ### Dataset Summary The news-sp500 dataset provides news articles related to companies in the S&P 500 index. ### Supported Tasks and Leaderboards The dataset can be used for various natural language processing tasks such as text classification, sentiment analysis, information extraction, etc. It does not have a specific leaderboard associated with it. ### Languages The dataset contains news articles in multiple languages. ## Dataset Structure ### Data Instances The dataset consists of [1563] data instances. ### Data Fields - symbol (string): A string representing the ticker symbol or abbreviation used to identify the company. - body (string): The main content of the news article. - publisher (string): The name of the publisher or news agency. - publish_time (timestamp[ns, tz=GMT]): A timestamp indicating the publication time of the news article in GMT timezone. - title (string): The title or headline of the news article. - url (string): The URL or link to the original news article. - uuid (string): A unique identifier for the news article. ### Data Splits The dataset consists of a single split called train. ## Dataset Creation ### Curation Rationale The news-sp500 dataset was created to provide a collection of news articles related to companies in the S&P 500 index for research and analysis purposes. ### Source Data #### Initial Data Collection and Normalization The data was collected from various online news sources and normalized for consistency. ### Annotations #### Annotation process [N/A] #### Who are the annotators? [N/A] ### Personal and Sensitive Information [N/A] ## Considerations for Using the Data ### Social Impact of Dataset [N/A] ### Discussion of Biases [N/A] ### Other Known Limitations [N/A] ## Additional Information ### Dataset Curators The news-sp500 dataset was collected by https://edarchimbaud.substack.com. ### Licensing Information The news-sp500 dataset is licensed under the MIT License. ### Citation Information > https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud ### Contributions Thanks to [@edarchimbaud](https://github.com/edarchimbaud) for adding this dataset.

提供机构：

edarchimbaud

原始信息汇总

数据集概述

数据集名称

名称: news-sp500

数据集内容

描述: 该数据集包含与S&P 500指数中的公司相关的新闻文章。

数据集用途

任务: 适用于文本分类、情感分析、信息提取等自然语言处理任务。
无特定领袖板。

语言

包含语言: 多种语言。

数据集结构

数据实例

实例数量: 1563个。

数据字段

symbol: 字符串，代表公司的股票代码或缩写。
body: 字符串，新闻文章的主要内容。
publisher: 字符串，出版商或新闻机构的名称。
publish_time: 时间戳（纳秒，时区为GMT），新闻文章的发布时间。
title: 字符串，新闻文章的标题或头条。
url: 字符串，原始新闻文章的链接。
uuid: 字符串，新闻文章的唯一标识符。

数据分割

分割: 仅包含训练集（train）。
训练集大小: 22025个示例，总大小为112563283字节。

数据集创建

数据收集与规范化

来源: 数据从多个在线新闻源收集并规范化。

许可证信息

许可证: MIT许可证。

引用信息

引用格式: https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud

5,000+

优质数据集

54 个

任务类型

进入经典数据集