rokaijano/extracel_waveforms
收藏Hugging Face2026-02-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/rokaijano/extracel_waveforms
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc
---
# Spike waveform shards (derived from IBL dandiset 000409)
A curated collection of per-spike multichannel waveform windows extracted from NWB assets in the DANDI dandiset 000409 (International Brain Laboratory, IBL). This repository contains the extractor and uploader used to produce Parquet shards; this README documents the derived dataset, its provenance, format, and how to reproduce it.
---
## Short description
Parquet shards of spike waveform windows (channels × timesteps). Each row contains a fixed-shape multichannel waveform centered on a spike and related metadata (recording URI, unit identifier, spike time, channel indices, and extraction metadata). The dataset is intended for ML and analysis workflows operating on per-spike waveforms.
## Source / provenance
- Primary source: DANDI Archive, Dandiset 000409 (International Brain Laboratory). See: https://dandiarchive.org/dandiset/000409
- The list of candidate NWB asset URIs used during extraction is stored here: [feature_extractor/dandi_downloader/dandi_list.txt](feature_extractor/dandi_downloader/dandi_list.txt)
- Note: not every URI in that list was necessarily processed — the Hugging Face dataset contains the subset of assets that were successfully downloaded and extracted.
Original dandiset summary (selected fields):
- Title: IBL - Brain Wide Map [deprecated]
- ID: 000409
- License: spdx:CC-BY-4.0 (verify per-asset license/metadata)
- Related resource: https://doi.org/10.1101/2023.07.04.547681
Refer to the original DANDI page for the full contributor list, asset-level metadata, and licensing details.
## What the dataset contains
- File type: Parquet (.parquet) shards (row-wise shards; default shard_rows=10000 in the extractor).
- Typical per-row fields (inspect shards to confirm exact schema):
- `recording_uri` — original NWB/DANDI URI
- `unit_id` — identifier for the unit within the NWB
- `spike_time` — spike time in seconds
- `waveform` — array/tensor (channels × timesteps)
- `peak_channel` / `channel_indices` — channels included in the window
- extraction metadata: sampling rate, sample index, extractor params (when present)
Notes:
- The extractor performs minimal filtering by default (no normalization, no SNR filtering) unless explicitly configured.
- Check a sample shard with `pyarrow` or `pandas` to confirm exact field names and dtypes.
## How the dataset was produced
- Workflow overview:
1. Download NWB asset (via `dandi` CLI, S3, or direct HTTP after resolving dandi:// URIs).
2. Open NWB/HDF5, locate `ElectricalSeries` and unit spike times/metadata.
3. For each spike, compute corresponding sample index (or use timestamps) and extract a fixed-size waveform window centered on the spike at the unit's peak channel.
4. Pad/truncate windows at edges to ensure a fixed `(channels, timesteps)` shape.
5. Aggregate rows and write Parquet shards.
6. Optionally upload shards using the uploader helper.
## Typical extraction parameters (defaults)
- `channels`: 25
- `timesteps`: 300
- `shard_rows`: 10000
- `compression`: zstd
- `max_spikes_per_unit`: 500
Actual dataset shards may have been produced with different parameter overrides; inspect the shard metadata or the extractor run logs for precise parameters used.
## Data licensing and attribution
- This dataset is a derived product of NWB assets hosted on DANDI. Users must comply with the license and attribution requirements of the original assets. The dandiset lists `spdx:CC-BY-4.0`, but please verify per-asset metadata on DANDI for any variations.
- When using or publishing results based on these shards, cite the original DANDI dandiset and the related IBL publication.
## Citation
Please cite:
- International Brain Laboratory — DANDI Dandiset 000409. https://dandiarchive.org/dandiset/000409
- A Brain-Wide Map of Neural Activity during Complex Behaviour (preprint): https://doi.org/10.1101/2023.07.04.547681
If you use the Hugging Face dataset derived from these shards, also cite the HF dataset page (include the HF dataset identifier and URL)
## Inspecting shards / example
Load a shard with pandas / pyarrow:
```python
import pyarrow.parquet as pq
import pandas as pd
tbl = pq.read_table('/path/to/shard_00000.parquet')
df = tbl.to_pandas()
print(df.columns)
print(df.iloc[0])
```
Last updated: 2026-02-02
提供机构:
rokaijano



