mb23/GraySpectrogram
收藏数据集概述
数据集信息
- 图像尺寸: 1025px × 216px
- caption: 音楽の説明
- data_idx: どのデータから生成されたデータか
- number: 5秒ずつ区切ったデータのうち、何番目か
数据集制作方法
- 代码: Google Colab 链接
- 参考 Kaggle Notebook: Kaggle Notebook 链接
制作步骤
-
解析 wav 文件: python y, sr = librosa.load("wavファイルなど")
-
应用傅里叶变换获取频率成分: python D = librosa.amplitude_to_db(np.abs(librosa.stft(y)), ref=np.max) image = Image.fromarray(np.uint8(D), mode=L) image.save(spectrogram_{}.png)
从频谱图恢复音乐(波形)
python im = Image.open("pngファイル") db_ud = np.uint8(np.array(im)) amp = librosa.db_to_amplitude(db_ud) y_inv = librosa.griffinlim(amp*200) display(IPython.display.Audio(y_inv, rate=sr))
使用示例
获取数据集信息
python import datasets from datasets import load_dataset
subset_name_list = [ data 0-200, data 200-600, data 600-1000, data 1000-1300, data 1600-2000, data 2000-2200, data 2200-2400, data 2400-2600, data 2600-2800, data 3000-3200, data 3200-3400, data 3600-3800, data 3800-4000, data 4000-4200, data 4200-4400, data 4400-4600, data 4600-4800, data 4800-5000, data 5000-5200, data 5200-5520 ]
data = load_dataset("mb23/GraySpectrogram", subset_name_list[0]) for subset in subset_name_list: new_ds = load_dataset("mb23/GraySpectrogram", subset) new_dataset_train = datasets.concatenate_datasets([data["train"], new_ds["train"]]) new_dataset_test = datasets.concatenate_datasets([data["test"], new_ds["test"]]) data["train"] = new_dataset_train data["test"] = new_dataset_test
加载数据集并转换为数据加载器
python import datasets from datasets import load_dataset, DatasetDict from torchvision import transforms from torch.utils.data import DataLoader
def load_datasets(): data_transforms = [ transforms.Resize((IMG_SIZE, IMG_SIZE)), transforms.ToTensor(), transforms.Lambda(lambda t: (t * 2) - 1) ] data_transform = transforms.Compose(data_transforms)
data = load_dataset("mb23/GraySpectrogram", subset_name_list[0])
for subset in subset_name_list:
new_ds = load_dataset("mb23/GraySpectrogram", subset)
new_dataset_train = datasets.concatenate_datasets([data["train"], new_ds["train"]])
new_dataset_test = datasets.concatenate_datasets([data["test"], new_ds["test"]])
data["train"] = new_dataset_train
data["test"] = new_dataset_test
new_dataset = dict()
new_dataset["train"] = Dataset.from_dict({
"image" : data["train"]["image"],
"caption" : data["train"]["caption"]
})
new_dataset["test"] = Dataset.from_dict({
"image" : data["test"]["image"],
"caption" : data["test"]["caption"]
})
data = datasets.DatasetDict(new_dataset)
train = data["train"]
test = data["test"]
for idx in range(len(train["image"])):
train["image"][idx] = data_transform(train["image"][idx])
test["image"][idx] = data_transform(test["image"][idx])
train = Dataset.from_dict(train)
train = train.with_format("torch")
test = Dataset.from_dict(train)
test = test.with_format("torch")
train_loader = DataLoader(train, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
test_loader = DataLoader(test, batch_size=BATCH_SIZE, shuffle=True, drop_last=True)
return train_loader, test_loader
使用数据加载器
python train_loader, test_loader = load_datasets()



