five

somAzzz/yahoo-finance-data

收藏
Hugging Face2026-02-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/somAzzz/yahoo-finance-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: odc-by viewer: false language: - en size_categories: - 100M<n<1B tags: - earnings-call-transcripts - market-data - stock-data - finance-data - finance - stock-news - yahoo-news dataset_info: - config_name: stock_earning_calendar features: - name: symbol dtype: string - name: report_date dtype: string - name: time dtype: string - name: name dtype: string - name: fiscal_quarter_ending dtype: string splits: - name: train num_bytes: 3441492 num_examples: 41692 - name: test num_bytes: 860373 num_examples: 10423 download_size: 466771 dataset_size: 4301865 - config_name: stock_tailing_eps features: - name: symbol dtype: string - name: report_date dtype: string - name: tailing_eps dtype: decimal128(38, 2) - name: eps dtype: decimal128(38, 2) - name: update_time dtype: string splits: - name: train num_bytes: 14435904 num_examples: 212944 - name: test num_bytes: 3609044 num_examples: 53237 download_size: 1872135 dataset_size: 18044948 configs: - config_name: stock_earning_calendar data_files: - split: train path: stock_earning_calendar/train-* - split: test path: stock_earning_calendar/test-* - config_name: stock_tailing_eps data_files: - split: train path: stock_tailing_eps/train-* - split: test path: stock_tailing_eps/test-* --- # The Financial data from Yahoo! <table border=1 cellpadding=10><tr><td> ### \*\*\* Key Points to Note \*\*\* --- **All financial data is sourced from Yahoo!Ⓡ Finance, Nasdaq!Ⓡ, and the U.S. Department of the Treasury via publicly available APIs, and is intended for research and educational purposes.** I will update the data regularly, and you are welcome to follow this project and use the data. Each time the data is updated, I will record the update time in [spec.json](https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/blob/main/spec.json). </td></tr></table> ### Data Usage Instructions Use [DuckDB](https://shell.duckdb.org/) or [Python API](https://github.com/defeat-beta/defeatbeta-api/) or [Claude AI](https://github.com/defeat-beta/defeatbeta-api/tree/main/mcp#use-in-claude-desktop) or [Manus AI](https://github.com/defeat-beta/defeatbeta-api/tree/main/mcp#use-in-manus) to Access Data All datasets are publicly accessible and stored in Parquet format. When using AI, you can also use [skills](https://github.com/defeat-beta/defeatbeta-api/tree/main/skills) for enhanced data analysis capabilities. --- #### Datasets Overview 1. **stock_profile** - **Source:** `https://finance.yahoo.com/quote/{$symbol}/profile/` - **Description:** Contains company details such as address, industry, and employee count. - **Columns:** | Column Name | Column Type | Description | |-----------------------|-------------|----------------------------| | symbol | VARCHAR | Stock ticker symbol | | address | VARCHAR | Company address | | city | VARCHAR | City | | country | VARCHAR | Country | | phone | VARCHAR | Phone number | | zip | VARCHAR | Zip code | | industry | VARCHAR | Industry type | | sector | VARCHAR | Business sector | | long_business_summary | VARCHAR | Business summary | | full_time_employees | INTEGER | Number of full-time staff | | report_date | VARCHAR | Data reporting date | 2. **stock_officers** - **Source:** `https://finance.yahoo.com/quote/{$symbol}/profile/` - **Description:** Lists company executives, including their pay and title. - **Columns:** | Column Name | Column Type | Description | |--------------|-------------|--------------------------| | symbol | VARCHAR | Stock ticker symbol | | name | VARCHAR | Executive's name | | title | VARCHAR | Executive's job title | | age | INTEGER | Executive's age | | born | INTEGER | Year of birth | | pay | INTEGER | Wage (USD) | | exercised | INTEGER | Stock options exercised | | unexercised | INTEGER | Unexercised stock options| 3. **stock_tailing_eps** - **Source:** `https://ycharts.com/companies/${symbol}/eps_ttm` - **Description:** Provides financial metrics such as Trailing earnings per share (TTM EPS). - **Columns:** | Column Name | Column Type | Description | |--------------|-----------------|--------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | tailing_eps | DECIMAL(38,2) | Trailing EPS | | update_time | VARCHAR | Last update time | 4. **stock_earning_calendar** - **Source:** `https://www.nasdaq.com/market-activity/earnings` - **Description:** Contains information about companies' earnings reports, including their ticker symbols, reporting dates, names, and fiscal quarter end dates. - **Columns:** | Column Name | Column Type | Description | |-----------------------|-------------|----------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | name | VARCHAR | Company Simple Name | | fiscal_quarter_ending | VARCHAR | Fiscal quarter end date | 5. **stock_statement** - **Source:** `https://finance.yahoo.com/quote/${symbol}/financials/` - **Description:** Contains financial statement details of companies, including ticker symbols, reporting dates, specific financial items, their values, and related statement types and periods. - **Columns:** | Column Name | Column Type | Description | |---------------|-----------------|-----------------------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | item_name | VARCHAR | Name of the financial statement item | | item_value | DECIMAL(38,2) | Value of the financial statement item | | finance_type | VARCHAR | Type of financial statement (e.g., balance_sheet, income_statement, cash_flow) | | period_type | VARCHAR | Reporting period type (e. g., annual, quarterly) | 6. **stock_prices** - **Source:** `https://finance.yahoo.com/quote/${symbol}/chart` - **Description:** Contains historical stock market data, including ticker symbols, reporting dates, and key trading metrics such as open, close, high, low prices, and trading volume. - **Columns:** | Column Name | Column Type | Description | |---------------|-----------------|-----------------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Trading date | | open | DECIMAL(38,2) | Opening price of the stock | | close | DECIMAL(38,2) | Closing price of the stock | | high | DECIMAL(38,2) | Highest price | | low | DECIMAL(38,2) | Lowest price | | volume | BIGINT | Number of shares traded | 7. **stock_dividend_events** - **Source:** `https://finance.yahoo.com/quote/${symbol}/chart` - **Description:** Contains dividend data, including stock tickers, reporting dates, and dividend values. - **Columns:** | Column Name | Column Type | Description | |---------------|-----------------|-----------------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | amount | DECIMAL(38,2) | Financial amount (e.g., dividend, interest) | 8. **stock_split_events** - **Source:** `https://finance.yahoo.com/quote/${symbol}/chart` - **Description:** Contains data about stock splits, including the stock ticker, reporting date, and the split factor. - **Columns:** | Column Name | Column Type | Description | |---------------|---------------|----------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | split_factor | VARCHAR | The factor by which shares are split | 9. **exchange_rate** - **Source:** `https://finance.yahoo.com/quote/${symbol}/chart` - **Description:** Contains currency exchange data for a report date, including opening, closing, highest, and lowest prices. - **Columns:** | Column Name | Column Type | Description | |---------------|---------------|----------------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | open | DECIMAL(38,2) | Opening price | | close | DECIMAL(38,2) | Closing price | | high | DECIMAL(38,2) | Highest price during the day | | low | DECIMAL(38,2) | Lowest price during the day | 10. **daily_treasury_yield** - **Source:** `https://home.treasury.gov/` - **Description:** Contains data related to daily treasury yield values for different time periods (monthly and yearly). - **Columns:** | Column Name | Column Type | Description | |---------------|---------------|------------------------------------| | report_date | VARCHAR | Reporting date | | bc1_month | DECIMAL(38,2) | Treasury yield for 1 month | | bc2_month | DECIMAL(38,2) | Treasury yield for 2 months | | bc3_month | DECIMAL(38,2) | Treasury yield for 3 months | | bc6_month | DECIMAL(38,2) | Treasury yield for 6 months | | bc1_year | DECIMAL(38,2) | Treasury yield for 1 year | | bc2_year | DECIMAL(38,2) | Treasury yield for 2 years | | bc3_year | DECIMAL(38,2) | Treasury yield for 3 years | | bc5_year | DECIMAL(38,2) | Treasury yield for 5 years | | bc7_year | DECIMAL(38,2) | Treasury yield for 7 years | | bc10_year | DECIMAL(38,2) | Treasury yield for 10 years | | bc30_year | DECIMAL(38,2) | Treasury yield for 30 years | 11. **stock_earning_call_transcripts** - **Source:** `https://finance.yahoo.com/quote/{symbol}/earnings-calls/` - **Description:** Contains verbatim transcripts of quarterly earnings calls for publicly traded companies, including speaker information and content segmentation. - **Columns:** | Column Name | Data Type | Description | |-----------------|---------------------------------------------------------------|-----------------------------------------------------------------------------| | symbol | VARCHAR | The stock ticker symbol of the company | | fiscal_year | INTEGER | The fiscal year of the earnings call | | fiscal_quarter | INTEGER | The fiscal quarter (1-4) of the earnings call | | report_date | VARCHAR | The date when the earnings call was reported (format may vary) | | transcripts | STRUCT<paragraph_number: INTEGER, speaker: VARCHAR, content: VARCHAR>[] | Array of structured transcript segments: `paragraph_number`: Sequential numbering of transcript paragraphs, `speaker`: Name and/or title of the speaker,`content`: The actual spoken content/text | | transcripts_id | INTEGER | Unique identifier for the transcript record | 12. **stock_news** - **Source:** `https://news.yahoo.com/` - **Description:** Stores information about financial research reports or news articles, including metadata and content details - **Columns:** | Column Name | Data Type | Description | |-----------------|---------------------------------------------------------------|-----------------------------------------------------------------------------| | uuid | VARCHAR | Unique identifier for the report/article (nullable) | | related_symbols | VARCHAR[] | Array of stock symbols or financial instruments related to the content | | title | VARCHAR | Title of the report/article (nullable) | | publisher | VARCHAR | Organization or entity that published the report (nullable) | | report_date | VARCHAR | Date when the report was published (stored as string, nullable) | | type | VARCHAR | Classification or category of the report/article (nullable) | | link | VARCHAR | URL or reference link to the original content (nullable) | | news | STRUCT<paragraph_number: INTEGER, highlight: VARCHAR, paragraph: VARCHAR>[] | Array of structured paragraphs containing content with numbering, highlights, and text (nullable) | 13. **stock_revenue_breakdown** - **Source:** - `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-segment/` - `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-product-group/` - `https://stockanalysis.com/stocks/${symbol}/metrics/revenue-by-geography/` - **Description:** Stores information about revenue by segment and revenue by geography - **Columns:** | Column Name | Data Type | Description | |----------------|---------------|--------------------------------------------------------------------------| | symbol | VARCHAR | The stock symbol or company identifier | | breakdown_type | VARCHAR | Type of financial breakdown (e.g., segment, geography, product) | | report_date | VARCHAR | Date when the financial report was issued (format may vary) | | item_name | VARCHAR | Name of the specific financial line item being reported | | item_value | DECIMAL(38,2) | Numerical value of the financial item, with 2 decimal places precision | 14. **stock_shares_outstanding** - **Source:** `https://ycharts.com/companies/${symbol}/shares_outstanding` - **Description:** Provides Shares Outstanding. - **Columns:** | Column Name | Column Type | Description | |--------------|-----------------|--------------------------| | symbol | VARCHAR | Stock ticker symbol | | report_date | VARCHAR | Reporting date | | shares_outstanding | Long | Shares Outstanding | 15. **stock_sec_filing** - **Source:** `https://www.sec.gov/cgi-bin/browse-edgar` - **Description:** Contains SEC (U.S. Securities and Exchange Commission) filing records for publicly traded companies. Supported form types include: - **US Domestic Company Forms:** - `10-K`, `10-K/A` - Annual report - `10-Q`, `10-Q/A` - Quarterly report - `8-K`, `8-K/A` - Current report (material events) - `DEF 14A`, `DEFA14A` - Proxy statement (shareholder meetings, executive compensation) - **Insider Trading Forms:** - `3`, `3/A` - Initial beneficial ownership - `4`, `4/A` - Changes in beneficial ownership - `5`, `5/A` - Annual beneficial ownership - `144`, `144/A` - Notice of proposed sale of securities - **Institutional Holdings:** - `13F-HR`, `13F-HR/A` - Institutional holdings report (quarterly) - `SC 13G`, `SC 13G/A` - Passive investor holdings (>5%) - `SC 13D`, `SC 13D/A` - Active investor holdings (>5%, may influence company) - **Foreign Private Issuer Forms** (e.g., BABA, PDD, JD): - `20-F`, `20-F/A` - Annual report - `6-K`, `6-K/A` - Current report (quarterly + material events) - **Canadian Company Forms** (e.g., SHOP, TD, RY): - `40-F`, `40-F/A` - Annual report - **ETF/Investment Company Forms** (e.g., SPY, QQQ, VOO): - `N-CSR`, `N-CSR/A` - Annual/Semi-annual shareholder report - `N-CSRS`, `N-CSRS/A` - Semi-annual shareholder report - `N-30D`, `N-30D/A` - Shareholder report (legacy format) - `NPORT-P` - Monthly portfolio holdings - `N-CEN` - Annual report (fund operations) - `N-Q`, `N-Q/A` - Quarterly portfolio (discontinued, historical data exists) - **Columns:** | Column Name | Column Type | Description | |------------------------|-------------|----------------------------------------------------------------| | cik | VARCHAR | Central Index Key - unique SEC identifier for the filer | | symbol | VARCHAR | Stock ticker symbol | | company_name | VARCHAR | Official company name as registered with SEC | | form_type | VARCHAR | SEC form type (e.g., 10-K, 10-Q, 8-K, 4, SC 13G) | | form_type_description | VARCHAR | Human-readable description of the form type | | accession_number | VARCHAR | Unique identifier for the filing | | filing_date | VARCHAR | Date when the filing was submitted to SEC | | report_date | VARCHAR | Period end date covered by the filing (if applicable) | | acceptance_date_time | VARCHAR | Timestamp when SEC accepted the filing | | filing_url | VARCHAR | Direct URL to the filing on SEC EDGAR | #### Querying Datasets Use the following SQL queries in [DuckDB](https://shell.duckdb.org/) to retrieve data for a specific stock (e.g., `TSLA`): 1. **stock_profile** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_profile.parquet' WHERE symbol='TSLA'; ``` 2. **stock_officers** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_officers.parquet' WHERE symbol='TSLA'; ``` 3. **stock_tailing_eps** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_tailing_eps.parquet' WHERE symbol='TSLA'; ``` 4. **stock_earning_calendar** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_earning_calendar.parquet' WHERE symbol='TSLA'; ``` 5. **stock_statement** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_statement.parquet' WHERE symbol='TSLA' and finance_type='income_statement' ``` 6. **stock_prices** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_prices.parquet' WHERE symbol='TSLA' ``` 7. **stock_dividend_events** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_dividend_events.parquet' WHERE symbol='TSLA' ``` 8. **stock_split_events** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_split_events.parquet' WHERE symbol='TSLA' ``` 10. **exchange_rate** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/exchange_rate.parquet' WHERE symbol='EUR=X' ``` 11. **daily_treasury_yield** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/daily_treasury_yield.parquet' ``` 12. **stock_earning_call_transcripts** ```sql SELECT symbol, fiscal_year, fiscal_quarter, report_date, unnest(transcripts).paragraph_number, unnest(transcripts).speaker, unnest(transcripts).content FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_earning_call_transcripts.parquet' WHERE symbol='TSLA' AND fiscal_year=2024 AND fiscal_quarter=4; ``` 13. **stock_news** ```sql SELECT related_symbols, uuid, title, publisher, report_date, type, link, unnest(news).paragraph_number, unnest(news).highlight, unnest(news).paragraph FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_news.parquet' WHERE uuid='00094540-141e-3893-a5ca-beb26abc150f'; ``` 13. **stock_revenue_breakdown** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_revenue_breakdown.parquet' WHERE symbol='TSLA'; ``` 14. **stock_shares_outstanding** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_shares_outstanding.parquet' WHERE symbol='TSLA'; ``` 15. **stock_sec_filing** ```sql SELECT * FROM 'https://huggingface.co/datasets/defeatbeta/yahoo-finance-data/resolve/main/data/stock_sec_filing.parquet' WHERE symbol='TSLA' AND form_type='10-K'; ```
提供机构:
somAzzz
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作