five

rohanmahen/phrase-ticker

收藏
Hugging Face2024-02-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/rohanmahen/phrase-ticker
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- # phrase-ticker Dataset ## Description The Phrase Ticker Dataset enables the extraction of stock ticker symbols from natural language queries. The dataset pairs NL utterances commonly associated with S&P 500 companies with their corresponding ticker symbols, providing a simple resource for understanding how companies are referred to in various contexts. ## Structure The dataset comprises two columns: - `phrase`: This column contains natural language phrases that reference or describe companies in ways that are commonly used in financial news, reports, and discussions. These include not only formal company names and products but also informal and colloquial references. - `ticker`: Each phrase is associated with a unique stock ticker symbol, identifying the company mentioned or described in the phrase. ## Primary Use Case **Ticker Extraction from Natural Language Queries**: The main application of this dataset is to train models that can accurately identify and extract stock ticker symbols from text. This capability is crucial for automating the analysis of financial news, social media mentions, analyst reports, and any textual content where companies are discussed without directly mentioning their ticker symbols. ## Getting Started To begin working with the phrase-ticker Dataset in your projects, you can load it using the Hugging Face `datasets` library: ```python from datasets import load_dataset dataset = load_dataset("rohanmahen/phrase-ticker") ``` ## Contributions Contributions to the phrase-ticker Dataset are welcomed, including the addition of new phrases, refinement of existing data, and suggestions for improvement. Please checkout the repository on [github](https://github.com/rohanmahen/phrase-ticker) for more info.
提供机构:
rohanmahen
原始信息汇总

phrase-ticker 数据集

描述

Phrase Ticker 数据集支持从自然语言查询中提取股票代码。该数据集将通常与标准普尔500指数公司相关的自然语言表述与其对应的股票代码配对,提供了一个简单资源,用于理解公司在各种上下文中的提及方式。

结构

数据集包含两列:

  • phrase:此列包含引用或描述公司的自然语言短语,这些短语通常用于财经新闻、报告和讨论中。这些短语不仅包括正式的公司名称和产品,还包括非正式和口语化的提及。
  • ticker:每个短语都与一个唯一的股票代码相关联,该代码标识短语中提及或描述的公司。

主要用途

从自然语言查询中提取股票代码:该数据集的主要应用是训练模型,以便准确识别和提取文本中的股票代码。这种能力对于自动化财经新闻、社交媒体提及、分析师报告以及任何讨论公司但不直接提及其股票代码的文本内容的分析至关重要。

开始使用

要在您的项目中开始使用 phrase-ticker 数据集,可以使用 Hugging Face 的 datasets 库加载它:

python from datasets import load_dataset

dataset = load_dataset("rohanmahen/phrase-ticker")

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作