novaia/world-heightmaps-360px
收藏World Heightmaps 360px
数据集概述
- 名称: World Heightmaps 360 V1
- 大小: 100K<n<1M
- 许可: Apache-2.0
- 任务类别:
- 无条件图像生成
- 图像分类
- 文本到图像
- 样本数量: 573,995
- 图像尺寸: 360x360
- 来源: 从SRTM 1 Arc-Second Global生成的地球高度图
- 标签: 每个高度图根据其纬度和经度进行标记
数据处理方法
-
转换GeoTIFF为PNG:
- 使用Python和Rasterio将GeoTIFF文件转换为PNG格式。 python import rasterio import matplotlib.pyplot as plt import os
input_directory = ... output_directory = ... file_list = os.listdir(input_directory)
for i in range(len(file_list)): image = rasterio.open(input_directory + file_list[i]) plt.imsave(output_directory + file_list[i][0:-4] + .png, image.read(1), cmap=gray)
-
分割PNG图像:
- 使用Split Image将PNG图像分割成100个patches。 python from split_image import split_image import os
input_directory = ... output_directory = ... file_list = os.listdir(input_directory)
for i in range(len(file_list)): split_image(input_directory + file_list[i], 10, 10, should_square=True, should_cleanup=False, output_dir=output_directory)
-
筛选数据集:
- 手动挑选损坏和未损坏的高度图,然后训练一个判别器来自动过滤整个数据集。
-
编译图像为Parquet文件:
- 将图像编译成Parquet文件格式。 python import pyarrow as pa import pyarrow.parquet as pq import pandas as pd from PIL import Image import os import io import json
samples_per_file = 6_000
root_dir = data/datasets/world-heightmaps-360px-png df = pd.read_csv(os.path.join(root_dir, metadata.csv)) df = df.sample(frac=1).reset_index(drop=True)
def save_table(image_data, table_number): print(fEntries in table {table_number}: {len(image_data)}) schema = pa.schema( fields=[ (heightmap, pa.struct([(bytes, pa.binary()), (path, pa.string())])), (latitude, pa.string()), (longitude, pa.string()) ], metadata={ bhuggingface: json.dumps({ info: { features: { heightmap: {_type: Image}, latitude: {_type: Value, dtype: string}, longitude: {_type: Value, dtype: string} } } }).encode(utf-8) } )
table = pa.Table.from_pylist(image_data, schema=schema) pq.write_table(table, fdata/world-heightmaps-360px-parquet/{str(table_number).zfill(4)}.parquet)image_data = [] samples_in_current_file = 0 current_file_number = 0 for i, row in df.iterrows(): if samples_in_current_file >= samples_per_file: save_table(image_data, current_file_number) image_data = [] samples_in_current_file = 0 current_file_number += 1 samples_in_current_file += 1 image_path = row[file_name] with Image.open(os.path.join(root_dir, image_path)) as image: image_bytes = io.BytesIO() image.save(image_bytes, format=PNG) image_dict = { heightmap: { bytes: image_bytes.getvalue(), path: image_path }, latitude: str(row[latitude]), longitude: str(row[longitude]) } image_data.append(image_dict)
save_table(image_data, current_file_number)



