Google's Audioset: Reformatted
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7096701
下载链接
链接失效反馈官方服务:
资源简介:
Google's AudioSet consistently reformatted
During my work with Google's AudioSet(https://research.google.com/audioset/index.html)
I encountered some problems due to the fact that Weak (https://research.google.com/audioset/download.html) and
Strong (https://research.google.com/audioset/download_strong.html) versions of the dataset used different csv formatting for the data, and that also labels used in the two datasets are different (https://github.com/audioset/ontology/issues/9) and also presented in files with different formatting.
This dataset reformatting aims to unify the formats of the datasets so that it is possible
to analyse them in the same pipelines, and also make the dataset files compatible
with psds_eval, dcase_util and sed_eval Python packages used in Audio Processing.
For better formatted documentation and source code of reformatting refer to https://github.com/bakhtos/GoogleAudioSetReformatted
-Changes in dataset
All files are converted to tab-separated `*.tsv` files (i.e. `csv` files with `\t`
as a separator). All files have a header as the first line.
-New fields and filenames
Fields are renamed according to the following table, to be compatible with psds_eval:
Old field -> New field
YTID -> filename
segment_id -> filename
start_seconds -> onset
start_time_seconds -> onset
end_seconds -> offset
end_time_seconds -> offset
positive_labels -> event_label
label -> event_label
present -> present
For class label files, `id` is now the name for the for `mid` label (e.g. `/m/09xor`)
and `label` for the human-readable label (e.g. `Speech`). Index of label indicated
for Weak dataset labels (`index` field in `class_labels_indices.csv`) is not used.
Files are renamed according to the following table to ensure consisted naming
of the form `audioset_[weak|strong]_[train|eval]_[balanced|unbalanced|posneg]*.tsv`:
Old name -> New name
balanced_train_segments.csv -> audioset_weak_train_balanced.tsv
unbalanced_train_segments.csv -> audioset_weak_train_unbalanced.tsv
eval_segments.csv -> audioset_weak_eval.tsv
audioset_train_strong.tsv -> audioset_strong_train.tsv
audioset_eval_strong.tsv -> audioset_strong_eval.tsv
audioset_eval_strong_framed_posneg.tsv -> audioset_strong_eval_posneg.tsv
class_labels_indices.csv -> class_labels.tsv (merged with mid_to_display_name.tsv)
mid_to_display_name.tsv -> class_labels.tsv (merged with class_labels_indices.csv)
-Strong dataset changes
Only changes to the Strong dataset are renaming of fields and reordering of columns,
so that both Weak and Strong version have `filename` and `event_label` as first
two columns.
-Weak dataset changes
-- Labels are given one per line, instead of comma-separated and quoted list
-- To make sure that `filename` format is the same as in Strong version, the following
format change is made:
The value of the `start_seconds` field is converted to milliseconds and appended to the `filename` with an underscore. Since all files in the dataset are assumed to be 10 seconds long, this unifies the format of `filename` with the Strong version and makes `end_seconds` also redundant.
-Class labels changes
Class labels from both datasets are merged into one file and given in alphabetical order of `id`s. Since same `id`s are present in both datasets, but sometimes with different human-readable labels, labels from Strong dataset overwrite those from Weak. It is possible to regenerate `class_labels.tsv` while giving priority to the Weak version of labels by calling `convert_labels(False)` from convert.py in the GitHub repository.
-License
Google's AudioSet was published in two stages - first the Weakly labelled data (Gemmeke, Jort F., et al. "Audio set: An ontology and human-labeled dataset for audio events." 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2017.), then the strongly labelled data (Hershey, Shawn, et al. "The benefit of temporally-strong labels in audio event classification." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.)
Both the original dataset and this reworked version are licensed under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
Class labels come from the AudioSet Ontology, which is licensed under CC BY-SA 4.0.
创建时间:
2022-09-21



