wahdan2003/LUPerson-NL

Name: wahdan2003/LUPerson-NL
Creator: wahdan2003
Published: 2025-11-22 12:22:44
License: 暂无描述

Hugging Face2025-11-22 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/wahdan2003/LUPerson-NL

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: mit tags: - person-reid - surveillance - pedestrian-detection size_categories: - 1M<n<10M dataset_info: features: - name: image dtype: image - name: person_id dtype: string - name: original_key dtype: string --- ## Dataset Summary **IMPORTANT NOTE** dataset belongs to https://github.com/DengpanFu/LUPerson-NL , the effort here is building it from source and make it available **LUPerson-NL-Full** is a large-scale dataset designed for **Person Re-Identification (ReID)** and computer vision tasks involving pedestrian analysis. It contains cropped images of individuals captured under various conditions. which is typically used for pre-training deep learning models to recognize the same individual across different camera views or timeframes also can be user to pretrain an encoder to learn high quality features of humans . - the "NL" is for noisy labels , there is 2 types of noise in this dataset - different ids can belong to the same person ( ids 1,2,3 can all belong to the same person) - not all the images assigned to an id belongs to it (some images of id 1 actually belong to another person of id 10 for example) ## Supported Tasks and Leaderboards * **Person Re-Identification (ReID):** The primary use case. Matching the `person_id` across different images. * **Pedestrian Detection/Attribute Recognition:** Analyzing features of pedestrians (though explicit attribute labels are not included in this specific subset, the images can be used for self-supervised learning). * **Image Classification:** Classifying images based on identity. ## Dataset Structure ### Data Instances The dataset is massive and is best loaded using `streaming=True`. Each instance represents a single cropped image of a person. ### Data Fields * `image`: A PIL.Image object containing the cropped pedestrian image. * `person_id`: A string representing the unique identity ID of the person. Images with the same `person_id` belong to the same individual. * `original_key`: A unique identifier or filename string from the original source dataset, often encoding video/frame information (e.g., `00004_0017_0008741_00`). ## Usage Example Because this dataset is large, it is highly recommended to use **Streaming** mode (Lazy Loading) to avoid downloading the entire dataset to disk. ```python import datasets import torchvision.transforms as T from PIL import Image # 1. Load the dataset in streaming mode # This establishes a connection without downloading the full data dataset = datasets.load_dataset("wahdan2003/LUPerson-NL-Full", streaming=True) # 2. Get the first item using an iterator first_item = next(iter(dataset['train'])) # 3. Access Metadata print(f"Person ID: {first_item['person_id']}") print(f"Original Key: {first_item['original_key']}") print(f"Image Size: {first_item['image'].size}") # Output: (width, height) # 4. Transform for PyTorch (Example) # Define a transform pipeline (Convert to Tensor) transforms = T.Compose([ T.ToTensor() ]) # Apply transform image_tensor = transforms(first_item['image']) print(f"Image Tensor Shape: {image_tensor.shape}") # Output: torch.Size([3, H, W]) print(f"Image Tensor Type: {image_tensor.dtype}") # Output: torch.float32 # 5. Save the image to inspect it # Note: .show() might fail in some headless Linux environments, so saving is safer. first_item['image'].save("test_image.jpg")

提供机构：

wahdan2003

5,000+

优质数据集

54 个

任务类型

进入经典数据集