xxparthparekhxx/indian-stock-market-minute-data
收藏Hugging Face2026-01-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/xxparthparekhxx/indian-stock-market-minute-data
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- time-series-forecasting
- tabular-regression
tags:
- finance
- nse
- india
- stock-market
- quantitative-finance
- upstox
pretty_name: Indian Stock Market Minute & Daily Data
size_categories:
- 10B<n<100B
configs:
- config_name: default
data_files:
- split: minute
path: minute/*.parquet
- split: day
path: day/*.parquet
---
# 🇮🇳 Indian Stock Market Data: Minute & Daily (2000 - 2026)
## 📌 Overview
This is a high-performance financial dataset containing the historical price history of **2,500+ NSE Stocks and Indices**.
The dataset has been **sharded and optimized** for high-speed training. Instead of thousands of tiny files, it is grouped into large ~1.5GB Parquet shards, making it ideal for fast streaming with the Hugging Face `datasets` library.
## 📊 Dataset Stats
- **Total Rows:** ~715 Million
- **Size:** ~10.5 GB (Compressed Snappy Parquet) / ~125 GB (Uncompressed)
- **Coverage:** 99.4% of active/suspended NSE Equities & Indices
- **Granularity:** - **Minute:** 1-minute intraday candles (2022-2026)
- **Day:** Daily candles (2000-2026)
- **Schema:** `symbol`, `timestamp` (UTC), `open`, `high`, `low`, `close`, `volume`, `oi`
## 📂 Directory Structure
The data is partitioned by frequency to allow for efficient loading.
```text
/minute/
train-00000.parquet (Stocks A-C)
train-00001.parquet (Stocks C-H)
...
/day/
train-00000.parquet (All Daily Data)
```
> **Note:** The files are sorted by `Symbol` then `Timestamp`. This means all data for a specific stock (e.g., `RELIANCE`) is contiguous within a single shard, maximizing compression and read speed.
## 💻 Usage (Python)
### 🚀 Option 1: Using Hugging Face Datasets (Recommended)
This method automatically handles downloading, caching, and iterating over the shards.
```python
from datasets import load_dataset
# 1. Load ALL Minute-Level Data (Streams 10.5 GB in shards)
# Use split="minute" to get the high-res intraday data
ds_minute = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="minute")
# 2. Filter for a specific stock
# (The library efficiently scans the Arrow table in RAM)
reliance = ds_minute.filter(lambda x: x['symbol'] == 'RELIANCE')
print(reliance[0])
```
### ⚡ Option 2: Streaming (No Download)
If you don't want to download the full 10.5 GB to disk, you can stream it on-the-fly.
```python
from datasets import load_dataset
dataset = load_dataset(
"xxparthparekhxx/indian-stock-market-minute-data",
split="minute",
streaming=True
)
# Iterate through the dataset without downloading everything
# Since data is sorted by Symbol, you will see all rows for a stock sequentially
for row in dataset:
if row['symbol'] == 'TATASTEEL':
print(row)
# Stop after finding the first row to prove it works
break
```
### 📉 Option 3: Load Daily Data Only
If you only need daily timeframe data (2000-2026), you can load just the daily split (~100MB).
```python
from datasets import load_dataset
ds_day = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="day")
print(ds_day[0])
```
### 🐼 Option 4: Using Pandas
You can read individual shards directly if you prefer manual control.
```python
import pandas as pd
# Load the first shard of minute data (Contains stocks starting with A-B approx)
df = pd.read_parquet("hf://datasets/xxparthparekhxx/indian-stock-market-minute-data/minute/train-00000.parquet")
print(df.head())
```
## 📝 Schema & Data Types
| Column | Type | Description |
|---|---|---|
| `symbol` | String | NSE Trading Symbol (e.g., `RELIANCE`, `NIFTY_50`) |
| `timestamp` | Datetime (ns) | **UTC Timezone**. (Add +5:30 for IST) |
| `open` | Float32 | Opening Price |
| `high` | Float32 | High Price |
| `low` | Float32 | Low Price |
| `close` | Float32 | Closing Price |
| `volume` | Int64 | Volume Traded |
| `oi` | Int64 | Open Interest (0 if not applicable) |
## ⚠️ Disclaimer
This dataset is intended for **research, educational, and backtesting purposes only**.
- It is not a live feed.
- Do not use this as the primary basis for live financial trading.
- The authors are not responsible for any financial losses incurred from using this data.
## 📄 License
This dataset is released under the **MIT License**.
许可证:MIT协议
任务类别:
- 时间序列预测
- 表格回归
标签:
- 金融
- NSE(印度国家证券交易所)
- 印度
- 股票市场
- 量化金融
- Upstox
展示名称:印度股票市场分钟级与日线数据
数据规模分级:100亿<数据量<1000亿
配置项:
- 配置名称:默认
数据文件:
- 拆分方式:分钟级
路径:minute/*.parquet
- 拆分方式:日线级
路径:day/*.parquet
# 🇮🇳 印度股票市场数据:分钟级与日线(2000-2026)
## 📌 概览
这是一份高性能金融行情数据集,涵盖**2500+支印度国家证券交易所(NSE)上市股票与指数**的历史价格数据。
本数据集已完成分片优化与性能调优,以支持高速训练。相较于数千个小型文件,本数据集被整合为单份约1.5GB的Parquet分片,非常适合使用Hugging Face `datasets`库进行快速流式读取。
## 📊 数据集统计
- **总数据行数**:约7.15亿行
- **数据体量**:压缩后(Snappy Parquet格式)约10.5GB / 未压缩约125GB
- **覆盖范围**:99.4%的活跃/停牌印度国家证券交易所(NSE)股票与指数
- **数据粒度**:
- 分钟级:1分钟频度日内K线(2022-2026年)
- 日线级:日频K线(2000-2026年)
- **数据Schema**:`symbol`(交易代码)、`timestamp`(UTC时间戳)、`open`(开盘价)、`high`(最高价)、`low`(最低价)、`close`(收盘价)、`volume`(成交量)、`oi`(持仓量)
## 📂 目录结构
数据按数据频率分区存储,以实现高效加载。
text
/minute/
train-00000.parquet (覆盖股票代码A-C)
train-00001.parquet (覆盖股票代码C-H)
...
/day/
train-00000.parquet (全量日线数据)
> **注意**:所有文件均按`symbol`(交易代码)与`timestamp`(时间戳)排序。这意味着单支股票(例如`RELIANCE`)的所有数据会连续存储在单个分片中,最大化压缩效率与读取速度。
## 💻 Python使用示例
### 🚀 方案1:使用Hugging Face Datasets库(推荐)
该方法可自动处理下载、缓存与分片迭代流程。
python
from datasets import load_dataset
# 1. 加载全量分钟级数据(以分片形式流式读取10.5GB数据)
# 使用split="minute"参数获取高分辨率日内行情数据
ds_minute = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="minute")
# 2. 筛选特定股票
# 该库可高效在内存中扫描Arrow表格
reliance = ds_minute.filter(lambda x: x['symbol'] == 'RELIANCE')
print(reliance[0])
### ⚡ 方案2:流式读取(无需下载)
若无需将全量10.5GB数据下载至本地,可直接进行流式读取。
python
from datasets import load_dataset
dataset = load_dataset(
"xxparthparekhxx/indian-stock-market-minute-data",
split="minute",
streaming=True
)
# 无需下载全部数据即可遍历数据集
# 由于数据已按交易代码排序,您将按顺序获取单支股票的所有数据行
for row in dataset:
if row['symbol'] == 'TATASTEEL':
print(row)
# 仅打印第一行以验证功能
break
### 📉 方案3:仅加载日线级数据
若仅需日频行情数据(2000-2026年),可仅加载日线拆分数据集(体量约100MB)。
python
from datasets import load_dataset
ds_day = load_dataset("xxparthparekhxx/indian-stock-market-minute-data", split="day")
print(ds_day[0])
### 🐼 方案4:使用Pandas手动读取
若您需要自主控制加载流程,可直接读取单个分片文件。
python
import pandas as pd
# 加载第一个分钟级数据分片(约覆盖股票代码A-B)
df = pd.read_parquet("hf://datasets/xxparthparekhxx/indian-stock-market-minute-data/minute/train-00000.parquet")
print(df.head())
## 📝 数据Schema与数据类型
| 列名 | 数据类型 | 字段说明 |
|---|---|---|
| `symbol` | 字符串 | 印度国家证券交易所(NSE)交易代码(例如`RELIANCE`、`NIFTY_50`) |
| `timestamp` | 纳秒级datetime | **UTC时区**(如需转换为印度标准时间(IST),需添加5小时30分钟) |
| `open` | Float32 | 开盘价 |
| `high` | Float32 | 最高价 |
| `low` | Float32 | 最低价 |
| `close` | Float32 | 收盘价 |
| `volume` | Int64 | 成交量 |
| `oi` | Int64 | 持仓量(无对应数据时为0) |
## ⚠️ 免责声明
本数据集仅用于**研究、教育与回测用途**。
- 本数据集非实时行情源。
- 请勿将其作为实盘金融交易的核心决策依据。
- 数据集作者不对因使用本数据导致的任何金融损失承担责任。
## 📄 许可证
本数据集采用**MIT协议**发布。
提供机构:
xxparthparekhxx



