five

xxparthparekhxx/indian-stock-market-minute-data

收藏
Hugging Face2026-01-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/xxparthparekhxx/indian-stock-market-minute-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - time-series-forecasting - tabular-regression tags: - finance - nse - india - stock-market - quantitative-finance - upstox pretty_name: Indian Stock Market Minute & Daily Data size_categories: - 10B<n<100B configs: - config_name: default data_files: - split: minute path: minute/*.parquet - split: day path: day/*.parquet --- # 🇮🇳 Indian Stock Market Data: Minute & Daily (2000 - 2026) ## 📌 Overview This is a high-performance financial dataset containing the historical price history of **2,500+ NSE Stocks and Indices**. The dataset has been **sharded and optimized** for high-speed training. Instead of thousands of tiny files, it is grouped into large ~1.5GB Parquet shards, making it ideal for fast streaming with the Hugging Face `datasets` library. ## 📊 Dataset Stats - **Total Rows:** ~715 Million - **Size:** ~10.5 GB (Compressed Snappy Parquet) / ~125 GB (Uncompressed) - **Coverage:** 99.4% of active/suspended NSE Equities & Indices - **Granularity:** - **Minute:** 1-minute intraday candles (2022-2026) - **Day:** Daily candles (2000-2026) - **Schema:** `symbol`, `timestamp` (UTC), `open`, `high`, `low`, `close`, `volume`, `oi` ## 📂 Directory Structure The data is partitioned by frequency to allow for efficient loading. ```text /minute/ train-00000.parquet (Stocks A-C) train-00001.parquet (Stocks C-H) ... /day/ train-00000.parquet (All Daily Data) ``` > **Note:** The files are sorted by `Symbol` then `Timestamp`. This means all data for a specific stock (e.g., `RELIANCE`) is contiguous within a single shard, maximizing compression and read speed. ## 💻 Usage (Python) ### 🚀 Option 1: Using Hugging Face Datasets (Recommended) This method automatically handles downloading, caching, and iterating over the shards. ```python from datasets import load_dataset # 1. Load ALL Minute-Level Data (Streams 10.5 GB in shards) # Use split="minute" to get the high-res intraday data ds_minute = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="minute") # 2. Filter for a specific stock # (The library efficiently scans the Arrow table in RAM) reliance = ds_minute.filter(lambda x: x['symbol'] == 'RELIANCE') print(reliance[0]) ``` ### ⚡ Option 2: Streaming (No Download) If you don't want to download the full 10.5 GB to disk, you can stream it on-the-fly. ```python from datasets import load_dataset dataset = load_dataset( "xxparthparekhxx/indian-stock-market-minute-data", split="minute", streaming=True ) # Iterate through the dataset without downloading everything # Since data is sorted by Symbol, you will see all rows for a stock sequentially for row in dataset: if row['symbol'] == 'TATASTEEL': print(row) # Stop after finding the first row to prove it works break ``` ### 📉 Option 3: Load Daily Data Only If you only need daily timeframe data (2000-2026), you can load just the daily split (~100MB). ```python from datasets import load_dataset ds_day = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="day") print(ds_day[0]) ``` ### 🐼 Option 4: Using Pandas You can read individual shards directly if you prefer manual control. ```python import pandas as pd # Load the first shard of minute data (Contains stocks starting with A-B approx) df = pd.read_parquet("hf://datasets/xxparthparekhxx/indian-stock-market-minute-data/minute/train-00000.parquet") print(df.head()) ``` ## 📝 Schema & Data Types | Column | Type | Description | |---|---|---| | `symbol` | String | NSE Trading Symbol (e.g., `RELIANCE`, `NIFTY_50`) | | `timestamp` | Datetime (ns) | **UTC Timezone**. (Add +5:30 for IST) | | `open` | Float32 | Opening Price | | `high` | Float32 | High Price | | `low` | Float32 | Low Price | | `close` | Float32 | Closing Price | | `volume` | Int64 | Volume Traded | | `oi` | Int64 | Open Interest (0 if not applicable) | ## ⚠️ Disclaimer This dataset is intended for **research, educational, and backtesting purposes only**. - It is not a live feed. - Do not use this as the primary basis for live financial trading. - The authors are not responsible for any financial losses incurred from using this data. ## 📄 License This dataset is released under the **MIT License**.

许可证:MIT协议 任务类别: - 时间序列预测 - 表格回归 标签: - 金融 - NSE(印度国家证券交易所) - 印度 - 股票市场 - 量化金融 - Upstox 展示名称:印度股票市场分钟级与日线数据 数据规模分级:100亿<数据量<1000亿 配置项: - 配置名称:默认 数据文件: - 拆分方式:分钟级 路径:minute/*.parquet - 拆分方式:日线级 路径:day/*.parquet # 🇮🇳 印度股票市场数据:分钟级与日线(2000-2026) ## 📌 概览 这是一份高性能金融行情数据集,涵盖**2500+支印度国家证券交易所(NSE)上市股票与指数**的历史价格数据。 本数据集已完成分片优化与性能调优,以支持高速训练。相较于数千个小型文件,本数据集被整合为单份约1.5GB的Parquet分片,非常适合使用Hugging Face `datasets`库进行快速流式读取。 ## 📊 数据集统计 - **总数据行数**:约7.15亿行 - **数据体量**:压缩后(Snappy Parquet格式)约10.5GB / 未压缩约125GB - **覆盖范围**:99.4%的活跃/停牌印度国家证券交易所(NSE)股票与指数 - **数据粒度**: - 分钟级:1分钟频度日内K线(2022-2026年) - 日线级:日频K线(2000-2026年) - **数据Schema**:`symbol`(交易代码)、`timestamp`(UTC时间戳)、`open`(开盘价)、`high`(最高价)、`low`(最低价)、`close`(收盘价)、`volume`(成交量)、`oi`(持仓量) ## 📂 目录结构 数据按数据频率分区存储,以实现高效加载。 text /minute/ train-00000.parquet (覆盖股票代码A-C) train-00001.parquet (覆盖股票代码C-H) ... /day/ train-00000.parquet (全量日线数据) > **注意**:所有文件均按`symbol`(交易代码)与`timestamp`(时间戳)排序。这意味着单支股票(例如`RELIANCE`)的所有数据会连续存储在单个分片中,最大化压缩效率与读取速度。 ## 💻 Python使用示例 ### 🚀 方案1:使用Hugging Face Datasets库(推荐) 该方法可自动处理下载、缓存与分片迭代流程。 python from datasets import load_dataset # 1. 加载全量分钟级数据(以分片形式流式读取10.5GB数据) # 使用split="minute"参数获取高分辨率日内行情数据 ds_minute = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="minute") # 2. 筛选特定股票 # 该库可高效在内存中扫描Arrow表格 reliance = ds_minute.filter(lambda x: x['symbol'] == 'RELIANCE') print(reliance[0]) ### ⚡ 方案2:流式读取(无需下载) 若无需将全量10.5GB数据下载至本地,可直接进行流式读取。 python from datasets import load_dataset dataset = load_dataset( "xxparthparekhxx/indian-stock-market-minute-data", split="minute", streaming=True ) # 无需下载全部数据即可遍历数据集 # 由于数据已按交易代码排序,您将按顺序获取单支股票的所有数据行 for row in dataset: if row['symbol'] == 'TATASTEEL': print(row) # 仅打印第一行以验证功能 break ### 📉 方案3:仅加载日线级数据 若仅需日频行情数据(2000-2026年),可仅加载日线拆分数据集(体量约100MB)。 python from datasets import load_dataset ds_day = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="day") print(ds_day[0]) ### 🐼 方案4:使用Pandas手动读取 若您需要自主控制加载流程,可直接读取单个分片文件。 python import pandas as pd # 加载第一个分钟级数据分片(约覆盖股票代码A-B) df = pd.read_parquet("hf://datasets/xxparthparekhxx/indian-stock-market-minute-data/minute/train-00000.parquet") print(df.head()) ## 📝 数据Schema与数据类型 | 列名 | 数据类型 | 字段说明 | |---|---|---| | `symbol` | 字符串 | 印度国家证券交易所(NSE)交易代码(例如`RELIANCE`、`NIFTY_50`) | | `timestamp` | 纳秒级datetime | **UTC时区**(如需转换为印度标准时间(IST),需添加5小时30分钟) | | `open` | Float32 | 开盘价 | | `high` | Float32 | 最高价 | | `low` | Float32 | 最低价 | | `close` | Float32 | 收盘价 | | `volume` | Int64 | 成交量 | | `oi` | Int64 | 持仓量(无对应数据时为0) | ## ⚠️ 免责声明 本数据集仅用于**研究、教育与回测用途**。 - 本数据集非实时行情源。 - 请勿将其作为实盘金融交易的核心决策依据。 - 数据集作者不对因使用本数据导致的任何金融损失承担责任。 ## 📄 许可证 本数据集采用**MIT协议**发布。
提供机构:
xxparthparekhxx
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作