five

rokaijano/extracel_waveforms

收藏
Hugging Face2026-02-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/rokaijano/extracel_waveforms
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc --- # Spike waveform shards (derived from IBL dandiset 000409) A curated collection of per-spike multichannel waveform windows extracted from NWB assets in the DANDI dandiset 000409 (International Brain Laboratory, IBL). This repository contains the extractor and uploader used to produce Parquet shards; this README documents the derived dataset, its provenance, format, and how to reproduce it. --- ## Short description Parquet shards of spike waveform windows (channels × timesteps). Each row contains a fixed-shape multichannel waveform centered on a spike and related metadata (recording URI, unit identifier, spike time, channel indices, and extraction metadata). The dataset is intended for ML and analysis workflows operating on per-spike waveforms. ## Source / provenance - Primary source: DANDI Archive, Dandiset 000409 (International Brain Laboratory). See: https://dandiarchive.org/dandiset/000409 - The list of candidate NWB asset URIs used during extraction is stored here: [feature_extractor/dandi_downloader/dandi_list.txt](feature_extractor/dandi_downloader/dandi_list.txt) - Note: not every URI in that list was necessarily processed — the Hugging Face dataset contains the subset of assets that were successfully downloaded and extracted. Original dandiset summary (selected fields): - Title: IBL - Brain Wide Map [deprecated] - ID: 000409 - License: spdx:CC-BY-4.0 (verify per-asset license/metadata) - Related resource: https://doi.org/10.1101/2023.07.04.547681 Refer to the original DANDI page for the full contributor list, asset-level metadata, and licensing details. ## What the dataset contains - File type: Parquet (.parquet) shards (row-wise shards; default shard_rows=10000 in the extractor). - Typical per-row fields (inspect shards to confirm exact schema): - `recording_uri` — original NWB/DANDI URI - `unit_id` — identifier for the unit within the NWB - `spike_time` — spike time in seconds - `waveform` — array/tensor (channels × timesteps) - `peak_channel` / `channel_indices` — channels included in the window - extraction metadata: sampling rate, sample index, extractor params (when present) Notes: - The extractor performs minimal filtering by default (no normalization, no SNR filtering) unless explicitly configured. - Check a sample shard with `pyarrow` or `pandas` to confirm exact field names and dtypes. ## How the dataset was produced - Workflow overview: 1. Download NWB asset (via `dandi` CLI, S3, or direct HTTP after resolving dandi:// URIs). 2. Open NWB/HDF5, locate `ElectricalSeries` and unit spike times/metadata. 3. For each spike, compute corresponding sample index (or use timestamps) and extract a fixed-size waveform window centered on the spike at the unit's peak channel. 4. Pad/truncate windows at edges to ensure a fixed `(channels, timesteps)` shape. 5. Aggregate rows and write Parquet shards. 6. Optionally upload shards using the uploader helper. ## Typical extraction parameters (defaults) - `channels`: 25 - `timesteps`: 300 - `shard_rows`: 10000 - `compression`: zstd - `max_spikes_per_unit`: 500 Actual dataset shards may have been produced with different parameter overrides; inspect the shard metadata or the extractor run logs for precise parameters used. ## Data licensing and attribution - This dataset is a derived product of NWB assets hosted on DANDI. Users must comply with the license and attribution requirements of the original assets. The dandiset lists `spdx:CC-BY-4.0`, but please verify per-asset metadata on DANDI for any variations. - When using or publishing results based on these shards, cite the original DANDI dandiset and the related IBL publication. ## Citation Please cite: - International Brain Laboratory — DANDI Dandiset 000409. https://dandiarchive.org/dandiset/000409 - A Brain-Wide Map of Neural Activity during Complex Behaviour (preprint): https://doi.org/10.1101/2023.07.04.547681 If you use the Hugging Face dataset derived from these shards, also cite the HF dataset page (include the HF dataset identifier and URL) ## Inspecting shards / example Load a shard with pandas / pyarrow: ```python import pyarrow.parquet as pq import pandas as pd tbl = pq.read_table('/path/to/shard_00000.parquet') df = tbl.to_pandas() print(df.columns) print(df.iloc[0]) ``` Last updated: 2026-02-02
提供机构:
rokaijano
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作