edarchimbaud/news-stocks
收藏Hugging Face2023-11-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/edarchimbaud/news-stocks
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: symbol
dtype: string
- name: body
dtype: string
- name: publisher
dtype: string
- name: publish_time
dtype: timestamp[ns, tz=GMT]
- name: title
dtype: string
- name: url
dtype: string
- name: uuid
dtype: string
splits:
- name: train
num_bytes: 112563283
num_examples: 22025
download_size: 55028670
dataset_size: 112563283
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "news-sp500"
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://edarchimbaud.substack.com
- **Repository:** https://github.com/edarchimbaud
- **Point of Contact:** contact@edarchimbaud.com
### Dataset Summary
The news-sp500 dataset provides news articles related to companies in the S&P 500 index.
### Supported Tasks and Leaderboards
The dataset can be used for various natural language processing tasks such as text classification, sentiment analysis, information extraction, etc. It does not have a specific leaderboard associated with it.
### Languages
The dataset contains news articles in multiple languages.
## Dataset Structure
### Data Instances
The dataset consists of [1563] data instances.
### Data Fields
- symbol (string): A string representing the ticker symbol or abbreviation used to identify the company.
- body (string): The main content of the news article.
- publisher (string): The name of the publisher or news agency.
- publish_time (timestamp[ns, tz=GMT]): A timestamp indicating the publication time of the news article in GMT timezone.
- title (string): The title or headline of the news article.
- url (string): The URL or link to the original news article.
- uuid (string): A unique identifier for the news article.
### Data Splits
The dataset consists of a single split called train.
## Dataset Creation
### Curation Rationale
The news-sp500 dataset was created to provide a collection of news articles related to companies in the S&P 500 index for research and analysis purposes.
### Source Data
#### Initial Data Collection and Normalization
The data was collected from various online news sources and normalized for consistency.
### Annotations
#### Annotation process
[N/A]
#### Who are the annotators?
[N/A]
### Personal and Sensitive Information
[N/A]
## Considerations for Using the Data
### Social Impact of Dataset
[N/A]
### Discussion of Biases
[N/A]
### Other Known Limitations
[N/A]
## Additional Information
### Dataset Curators
The news-sp500 dataset was collected by https://edarchimbaud.substack.com.
### Licensing Information
The news-sp500 dataset is licensed under the MIT License.
### Citation Information
> https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud
### Contributions
Thanks to [@edarchimbaud](https://github.com/edarchimbaud) for adding this dataset.
提供机构:
edarchimbaud
原始信息汇总
数据集概述
数据集名称
- 名称: news-sp500
数据集内容
- 描述: 该数据集包含与S&P 500指数中的公司相关的新闻文章。
数据集用途
- 任务: 适用于文本分类、情感分析、信息提取等自然语言处理任务。
- 无特定领袖板。
语言
- 包含语言: 多种语言。
数据集结构
数据实例
- 实例数量: 1563个。
数据字段
- symbol: 字符串,代表公司的股票代码或缩写。
- body: 字符串,新闻文章的主要内容。
- publisher: 字符串,出版商或新闻机构的名称。
- publish_time: 时间戳(纳秒,时区为GMT),新闻文章的发布时间。
- title: 字符串,新闻文章的标题或头条。
- url: 字符串,原始新闻文章的链接。
- uuid: 字符串,新闻文章的唯一标识符。
数据分割
- 分割: 仅包含训练集(train)。
- 训练集大小: 22025个示例,总大小为112563283字节。
数据集创建
数据收集与规范化
- 来源: 数据从多个在线新闻源收集并规范化。
许可证信息
- 许可证: MIT许可证。
引用信息
- 引用格式: https://edarchimbaud.substack.com, news-sp500 dataset, GitHub repository, https://github.com/edarchimbaud



