somAzzz/yahoo-finance-data
收藏Hugging Face2026-02-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/somAzzz/yahoo-finance-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: odc-by
viewer: false
language:
- en
size_categories:
- 100M<n<1B
tags:
- earnings-call-transcripts
- market-data
- stock-data
- finance-data
- finance
- stock-news
- yahoo-news
dataset_info:
- config_name: stock_earning_calendar
features:
- name: symbol
dtype: string
- name: report_date
dtype: string
- name: time
dtype: string
- name: name
dtype: string
- name: fiscal_quarter_ending
dtype: string
splits:
- name: train
num_bytes: 3441492
num_examples: 41692
- name: test
num_bytes: 860373
num_examples: 10423
download_size: 466771
dataset_size: 4301865
- config_name: stock_tailing_eps
features:
- name: symbol
dtype: string
- name: report_date
dtype: string
- name: tailing_eps
dtype: decimal128(38, 2)
- name: eps
dtype: decimal128(38, 2)
- name: update_time
dtype: string
splits:
- name: train
num_bytes: 14435904
num_examples: 212944
- name: test
num_bytes: 3609044
num_examples: 53237
download_size: 1872135
dataset_size: 18044948
configs:
- config_name: stock_earning_calendar
data_files:
- split: train
path: stock_earning_calendar/train-*
- split: test
path: stock_earning_calendar/test-*
- config_name: stock_tailing_eps
data_files:
- split: train
path: stock_tailing_eps/train-*
- split: test
path: stock_tailing_eps/test-*
---
# The Financial data from Yahoo!
<table border=1 cellpadding=10><tr><td>
### \*\*\* Key Points to Note \*\*\*
---
**All financial data is sourced from Yahoo!Ⓡ Finance, Nasdaq!Ⓡ, and the U.S. Department of the Treasury via publicly available APIs, and is intended for research and educational purposes.**
I will update the data regularly, and you are welcome to follow this project and use the data.
Each time the data is updated, I will record the update time in [spec.json](https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/blob/main/spec.json).
</td></tr></table>
### Data Usage Instructions
Use [DuckDB](https://shell.duckdb.org/) or [Python API](https://github.com/defeat-beta/defeatbeta-api/) or [Claude AI](https://github.com/defeat-beta/defeatbeta-api/tree/main/mcp#use-in-claude-desktop) or [Manus AI](https://github.com/defeat-beta/defeatbeta-api/tree/main/mcp#use-in-manus) to Access Data
All datasets are publicly accessible and stored in Parquet format.
When using AI, you can also use [skills](https://github.com/defeat-beta/defeatbeta-api/tree/main/skills) for enhanced data analysis capabilities.
---
#### Datasets Overview
1. **stock_profile**
- **Source:** `https://finance.yahoo.com/quote/{$symbol}/profile/`
- **Description:** Contains company details such as address, industry, and employee count.
- **Columns:**
| Column Name | Column Type | Description |
|-----------------------|-------------|----------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| address | VARCHAR | Company address |
| city | VARCHAR | City |
| country | VARCHAR | Country |
| phone | VARCHAR | Phone number |
| zip | VARCHAR | Zip code |
| industry | VARCHAR | Industry type |
| sector | VARCHAR | Business sector |
| long_business_summary | VARCHAR | Business summary |
| full_time_employees | INTEGER | Number of full-time staff |
| report_date | VARCHAR | Data reporting date |
2. **stock_officers**
- **Source:** `https://finance.yahoo.com/quote/{$symbol}/profile/`
- **Description:** Lists company executives, including their pay and title.
- **Columns:**
| Column Name | Column Type | Description |
|--------------|-------------|--------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| name | VARCHAR | Executive's name |
| title | VARCHAR | Executive's job title |
| age | INTEGER | Executive's age |
| born | INTEGER | Year of birth |
| pay | INTEGER | Wage (USD) |
| exercised | INTEGER | Stock options exercised |
| unexercised | INTEGER | Unexercised stock options|
3. **stock_tailing_eps**
- **Source:** `https://ycharts.com/companies/${symbol}/eps_ttm`
- **Description:** Provides financial metrics such as Trailing earnings per share (TTM EPS).
- **Columns:**
| Column Name | Column Type | Description |
|--------------|-----------------|--------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| tailing_eps | DECIMAL(38,2) | Trailing EPS |
| update_time | VARCHAR | Last update time |
4. **stock_earning_calendar**
- **Source:** `https://www.nasdaq.com/market-activity/earnings`
- **Description:** Contains information about companies' earnings reports, including their ticker symbols, reporting dates, names, and fiscal quarter end dates.
- **Columns:**
| Column Name | Column Type | Description |
|-----------------------|-------------|----------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| name | VARCHAR | Company Simple Name |
| fiscal_quarter_ending | VARCHAR | Fiscal quarter end date |
5. **stock_statement**
- **Source:** `https://finance.yahoo.com/quote/${symbol}/financials/`
- **Description:** Contains financial statement details of companies, including ticker symbols, reporting dates, specific financial items, their values, and related statement types and periods.
- **Columns:**
| Column Name | Column Type | Description |
|---------------|-----------------|-----------------------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| item_name | VARCHAR | Name of the financial statement item |
| item_value | DECIMAL(38,2) | Value of the financial statement item |
| finance_type | VARCHAR | Type of financial statement (e.g., balance_sheet, income_statement, cash_flow) |
| period_type | VARCHAR | Reporting period type (e. g., annual, quarterly) |
6. **stock_prices**
- **Source:** `https://finance.yahoo.com/quote/${symbol}/chart`
- **Description:** Contains historical stock market data, including ticker symbols, reporting dates, and key trading metrics such as open, close, high, low prices, and trading volume.
- **Columns:**
| Column Name | Column Type | Description |
|---------------|-----------------|-----------------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Trading date |
| open | DECIMAL(38,2) | Opening price of the stock |
| close | DECIMAL(38,2) | Closing price of the stock |
| high | DECIMAL(38,2) | Highest price |
| low | DECIMAL(38,2) | Lowest price |
| volume | BIGINT | Number of shares traded |
7. **stock_dividend_events**
- **Source:** `https://finance.yahoo.com/quote/${symbol}/chart`
- **Description:** Contains dividend data, including stock tickers, reporting dates, and dividend values.
- **Columns:**
| Column Name | Column Type | Description |
|---------------|-----------------|-----------------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| amount | DECIMAL(38,2) | Financial amount (e.g., dividend, interest) |
8. **stock_split_events**
- **Source:** `https://finance.yahoo.com/quote/${symbol}/chart`
- **Description:** Contains data about stock splits, including the stock ticker, reporting date, and the split factor.
- **Columns:**
| Column Name | Column Type | Description |
|---------------|---------------|----------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| split_factor | VARCHAR | The factor by which shares are split |
9. **exchange_rate**
- **Source:** `https://finance.yahoo.com/quote/${symbol}/chart`
- **Description:** Contains currency exchange data for a report date, including opening, closing, highest, and lowest prices.
- **Columns:**
| Column Name | Column Type | Description |
|---------------|---------------|----------------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| open | DECIMAL(38,2) | Opening price |
| close | DECIMAL(38,2) | Closing price |
| high | DECIMAL(38,2) | Highest price during the day |
| low | DECIMAL(38,2) | Lowest price during the day |
10. **daily_treasury_yield**
- **Source:** `https://home.treasury.gov/`
- **Description:** Contains data related to daily treasury yield values for different time periods (monthly and yearly).
- **Columns:**
| Column Name | Column Type | Description |
|---------------|---------------|------------------------------------|
| report_date | VARCHAR | Reporting date |
| bc1_month | DECIMAL(38,2) | Treasury yield for 1 month |
| bc2_month | DECIMAL(38,2) | Treasury yield for 2 months |
| bc3_month | DECIMAL(38,2) | Treasury yield for 3 months |
| bc6_month | DECIMAL(38,2) | Treasury yield for 6 months |
| bc1_year | DECIMAL(38,2) | Treasury yield for 1 year |
| bc2_year | DECIMAL(38,2) | Treasury yield for 2 years |
| bc3_year | DECIMAL(38,2) | Treasury yield for 3 years |
| bc5_year | DECIMAL(38,2) | Treasury yield for 5 years |
| bc7_year | DECIMAL(38,2) | Treasury yield for 7 years |
| bc10_year | DECIMAL(38,2) | Treasury yield for 10 years |
| bc30_year | DECIMAL(38,2) | Treasury yield for 30 years |
11. **stock_earning_call_transcripts**
- **Source:** `https://finance.yahoo.com/quote/{symbol}/earnings-calls/`
- **Description:** Contains verbatim transcripts of quarterly earnings calls for publicly traded companies, including speaker information and content segmentation.
- **Columns:**
| Column Name | Data Type | Description |
|-----------------|---------------------------------------------------------------|-----------------------------------------------------------------------------|
| symbol | VARCHAR | The stock ticker symbol of the company |
| fiscal_year | INTEGER | The fiscal year of the earnings call |
| fiscal_quarter | INTEGER | The fiscal quarter (1-4) of the earnings call |
| report_date | VARCHAR | The date when the earnings call was reported (format may vary) |
| transcripts | STRUCT<paragraph_number: INTEGER, speaker: VARCHAR, content: VARCHAR>[] | Array of structured transcript segments: `paragraph_number`: Sequential numbering of transcript paragraphs, `speaker`: Name and/or title of the speaker,`content`: The actual spoken content/text |
| transcripts_id | INTEGER | Unique identifier for the transcript record |
12. **stock_news**
- **Source:** `https://news.yahoo.com/`
- **Description:** Stores information about financial research reports or news articles, including metadata and content details
- **Columns:**
| Column Name | Data Type | Description |
|-----------------|---------------------------------------------------------------|-----------------------------------------------------------------------------|
| uuid | VARCHAR | Unique identifier for the report/article (nullable) |
| related_symbols | VARCHAR[] | Array of stock symbols or financial instruments related to the content |
| title | VARCHAR | Title of the report/article (nullable) |
| publisher | VARCHAR | Organization or entity that published the report (nullable) |
| report_date | VARCHAR | Date when the report was published (stored as string, nullable) |
| type | VARCHAR | Classification or category of the report/article (nullable) |
| link | VARCHAR | URL or reference link to the original content (nullable) |
| news | STRUCT<paragraph_number: INTEGER, highlight: VARCHAR, paragraph: VARCHAR>[] | Array of structured paragraphs containing content with numbering, highlights, and text (nullable) |
13. **stock_revenue_breakdown**
- **Source:**
- `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-segment/`
- `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-product-group/`
- `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-geography/`
- **Description:** Stores information about revenue by segment and revenue by geography
- **Columns:**
| Column Name | Data Type | Description |
|----------------|---------------|--------------------------------------------------------------------------|
| symbol | VARCHAR | The stock symbol or company identifier |
| breakdown_type | VARCHAR | Type of financial breakdown (e.g., segment, geography, product) |
| report_date | VARCHAR | Date when the financial report was issued (format may vary) |
| item_name | VARCHAR | Name of the specific financial line item being reported |
| item_value | DECIMAL(38,2) | Numerical value of the financial item, with 2 decimal places precision |
14. **stock_shares_outstanding**
- **Source:** `https://ycharts.com/companies/${symbol}/shares_outstanding`
- **Description:** Provides Shares Outstanding.
- **Columns:**
| Column Name | Column Type | Description |
|--------------|-----------------|--------------------------|
| symbol | VARCHAR | Stock ticker symbol |
| report_date | VARCHAR | Reporting date |
| shares_outstanding | Long | Shares Outstanding |
15. **stock_sec_filing**
- **Source:** `https://www.sec.gov/cgi-bin/browse-edgar`
- **Description:** Contains SEC (U.S. Securities and Exchange Commission) filing records for publicly traded companies. Supported form types include:
- **US Domestic Company Forms:**
- `10-K`, `10-K/A` - Annual report
- `10-Q`, `10-Q/A` - Quarterly report
- `8-K`, `8-K/A` - Current report (material events)
- `DEF 14A`, `DEFA14A` - Proxy statement (shareholder meetings, executive compensation)
- **Insider Trading Forms:**
- `3`, `3/A` - Initial beneficial ownership
- `4`, `4/A` - Changes in beneficial ownership
- `5`, `5/A` - Annual beneficial ownership
- `144`, `144/A` - Notice of proposed sale of securities
- **Institutional Holdings:**
- `13F-HR`, `13F-HR/A` - Institutional holdings report (quarterly)
- `SC 13G`, `SC 13G/A` - Passive investor holdings (>5%)
- `SC 13D`, `SC 13D/A` - Active investor holdings (>5%, may influence company)
- **Foreign Private Issuer Forms** (e.g., BABA, PDD, JD):
- `20-F`, `20-F/A` - Annual report
- `6-K`, `6-K/A` - Current report (quarterly + material events)
- **Canadian Company Forms** (e.g., SHOP, TD, RY):
- `40-F`, `40-F/A` - Annual report
- **ETF/Investment Company Forms** (e.g., SPY, QQQ, VOO):
- `N-CSR`, `N-CSR/A` - Annual/Semi-annual shareholder report
- `N-CSRS`, `N-CSRS/A` - Semi-annual shareholder report
- `N-30D`, `N-30D/A` - Shareholder report (legacy format)
- `NPORT-P` - Monthly portfolio holdings
- `N-CEN` - Annual report (fund operations)
- `N-Q`, `N-Q/A` - Quarterly portfolio (discontinued, historical data exists)
- **Columns:**
| Column Name | Column Type | Description |
|------------------------|-------------|----------------------------------------------------------------|
| cik | VARCHAR | Central Index Key - unique SEC identifier for the filer |
| symbol | VARCHAR | Stock ticker symbol |
| company_name | VARCHAR | Official company name as registered with SEC |
| form_type | VARCHAR | SEC form type (e.g., 10-K, 10-Q, 8-K, 4, SC 13G) |
| form_type_description | VARCHAR | Human-readable description of the form type |
| accession_number | VARCHAR | Unique identifier for the filing |
| filing_date | VARCHAR | Date when the filing was submitted to SEC |
| report_date | VARCHAR | Period end date covered by the filing (if applicable) |
| acceptance_date_time | VARCHAR | Timestamp when SEC accepted the filing |
| filing_url | VARCHAR | Direct URL to the filing on SEC EDGAR |
#### Querying Datasets
Use the following SQL queries in [DuckDB](https://shell.duckdb.org/) to retrieve data for a specific stock (e.g., `TSLA`):
1. **stock_profile**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_profile.parquet'
WHERE symbol='TSLA';
```
2. **stock_officers**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_officers.parquet'
WHERE symbol='TSLA';
```
3. **stock_tailing_eps**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_tailing_eps.parquet'
WHERE symbol='TSLA';
```
4. **stock_earning_calendar**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_earning_calendar.parquet'
WHERE symbol='TSLA';
```
5. **stock_statement**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_statement.parquet'
WHERE symbol='TSLA' and finance_type='income_statement'
```
6. **stock_prices**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_prices.parquet'
WHERE symbol='TSLA'
```
7. **stock_dividend_events**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_dividend_events.parquet'
WHERE symbol='TSLA'
```
8. **stock_split_events**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_split_events.parquet'
WHERE symbol='TSLA'
```
10. **exchange_rate**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/exchange_rate.parquet'
WHERE symbol='EUR=X'
```
11. **daily_treasury_yield**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/daily_treasury_yield.parquet'
```
12. **stock_earning_call_transcripts**
```sql
SELECT symbol,
fiscal_year,
fiscal_quarter,
report_date,
unnest(transcripts).paragraph_number,
unnest(transcripts).speaker,
unnest(transcripts).content
FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_earning_call_transcripts.parquet'
WHERE symbol='TSLA' AND fiscal_year=2024 AND fiscal_quarter=4;
```
13. **stock_news**
```sql
SELECT related_symbols,
uuid,
title,
publisher,
report_date,
type,
link,
unnest(news).paragraph_number,
unnest(news).highlight,
unnest(news).paragraph
FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_news.parquet'
WHERE uuid='00094540-141e-3893-a5ca-beb26abc150f';
```
13. **stock_revenue_breakdown**
```sql
SELECT *
FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_revenue_breakdown.parquet'
WHERE symbol='TSLA';
```
14. **stock_shares_outstanding**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_shares_outstanding.parquet'
WHERE symbol='TSLA';
```
15. **stock_sec_filing**
```sql
SELECT * FROM
'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_sec_filing.parquet'
WHERE symbol='TSLA' AND form_type='10-K';
```
提供机构:
somAzzz



