five

sogosonnet/SP500-Chart-Dataset

收藏
Hugging Face2026-03-22 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sogosonnet/SP500-Chart-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - image-classification language: - en tags: - finance - stock-market - candlestick-chart - technical-analysis - sp500 size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* dataset_info: features: - name: image dtype: image - name: label dtype: class_label: names: '0': down_3plus '1': down_2_3 '2': down_1_2 '3': up_1_2 '4': up_2_3 '5': up_3plus - name: ticker dtype: string - name: end_date dtype: string - name: pct_return dtype: float32 splits: - name: train num_bytes: 21467124746 num_examples: 1070721 - name: test num_bytes: 5947059941 num_examples: 301890 download_size: 27126018753 dataset_size: 27414184687 --- # SP500-Chart-Dataset A large-scale candlestick chart image dataset for financial image classification research, covering 501 S&P 500 stocks. ## Overview | Item | Value | |------|-------| | **Stocks** | 501 S&P 500 constituents | | **Total Images** | 1,374,694 | | **Period** | 2010-01 – 2025-03 | | **Image Size** | ~480×480 px (4×4 inch, 120 dpi) | | **Chart Type** | Candlestick (OHLCV) with technical indicators | | **Labels** | 6-class forward return (±1%/±2%/±3%) | ## Download Download `sp500_images.zip` (26 GB) from the Files tab and extract: ```bash unzip sp500_images.zip ``` Directory structure after extraction: ``` cross_sectional_data/images/{TICKER}/{CLASS}/{TICKER}_{INDEX}_{DATE}.png ``` Metadata JSON files are in the `metadata/` folder. ## Chart Specification Each chart is a 20-trading-day candlestick chart rendered with mplfinance: - **OHLCV candlestick** bars (Charles style) - **Volume** bars (bottom panel) - **MA5** (blue dotted) — 5-day moving average - **MA60** (red dashed) — 60-day moving average - **MA120** (green solid) — 120-day moving average - **Bollinger Bands** (grey shaded area, α=0.15) — 20-day ± 2σ > All technical indicators are guaranteed to be visible in every image. The y-axis is automatically scaled to include all indicator values. Prices are percentage-normalized relative to the first closing price in each window (0% baseline), making visual patterns scale-invariant across stocks. ## Label Definition Labels are based on the **5-day forward return**: | Label | Return Range | |-------|-------------| | `down_3plus` | r < −3% | | `down_2_3` | −3% ≤ r < −2% | | `down_1_2` | −2% ≤ r < −1% | | `up_1_2` | 1% < r ≤ 2% | | `up_2_3` | 2% < r ≤ 3% | | `up_3plus` | r > 3% | > Returns in [−1%, +1%] are excluded as ambiguous. ## Temporal Split - **Train**: `end_date` < 2022-12-21 - **Test**: `end_date` >= 2023-01-01 - **Embargo**: 10 calendar days ## Usage ```python import json from pathlib import Path from PIL import Image with open('metadata/samples_AAPL.json') as f: meta = json.load(f) sample = meta['samples'][0] img_path = Path('cross_sectional_data/images') / sample['ticker'] / sample['label'] / \ f"{sample['ticker']}_{sample['index']}_{sample['end_date'].replace('-','')}.png" img = Image.open(img_path) ``` ## Citation ```bibtex @misc{sp500chart2025, title={SP500-Chart-Dataset}, author={Ahn, Jaehyun}, year={2025}, howpublished={\url{https://github.com/JaehyunAhn/SP500-Chart-Dataset}}, note={Yonsei University} } ``` ## License CC BY 4.0
提供机构:
sogosonnet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作