rohanmahen/phrase-ticker
收藏Hugging Face2024-02-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rohanmahen/phrase-ticker
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
# phrase-ticker Dataset
## Description
The Phrase Ticker Dataset enables the extraction of stock ticker symbols from natural language queries. The dataset pairs NL utterances commonly associated with S&P 500 companies with their corresponding ticker symbols, providing a simple resource for understanding how companies are referred to in various contexts.
## Structure
The dataset comprises two columns:
- `phrase`: This column contains natural language phrases that reference or describe companies in ways that are commonly used in financial news, reports, and discussions. These include not only formal company names and products but also informal and colloquial references.
- `ticker`: Each phrase is associated with a unique stock ticker symbol, identifying the company mentioned or described in the phrase.
## Primary Use Case
**Ticker Extraction from Natural Language Queries**: The main application of this dataset is to train models that can accurately identify and extract stock ticker symbols from text. This capability is crucial for automating the analysis of financial news, social media mentions, analyst reports, and any textual content where companies are discussed without directly mentioning their ticker symbols.
## Getting Started
To begin working with the phrase-ticker Dataset in your projects, you can load it using the Hugging Face `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("rohanmahen/phrase-ticker")
```
## Contributions
Contributions to the phrase-ticker Dataset are welcomed, including the addition of new phrases, refinement of existing data, and suggestions for improvement. Please checkout the repository on [github](https://github.com/rohanmahen/phrase-ticker) for more info.
提供机构:
rohanmahen
原始信息汇总
phrase-ticker 数据集
描述
Phrase Ticker 数据集支持从自然语言查询中提取股票代码。该数据集将通常与标准普尔500指数公司相关的自然语言表述与其对应的股票代码配对,提供了一个简单资源,用于理解公司在各种上下文中的提及方式。
结构
数据集包含两列:
phrase:此列包含引用或描述公司的自然语言短语,这些短语通常用于财经新闻、报告和讨论中。这些短语不仅包括正式的公司名称和产品,还包括非正式和口语化的提及。ticker:每个短语都与一个唯一的股票代码相关联,该代码标识短语中提及或描述的公司。
主要用途
从自然语言查询中提取股票代码:该数据集的主要应用是训练模型,以便准确识别和提取文本中的股票代码。这种能力对于自动化财经新闻、社交媒体提及、分析师报告以及任何讨论公司但不直接提及其股票代码的文本内容的分析至关重要。
开始使用
要在您的项目中开始使用 phrase-ticker 数据集,可以使用 Hugging Face 的 datasets 库加载它:
python from datasets import load_dataset
dataset = load_dataset("rohanmahen/phrase-ticker")



