wahdan2003/LUPerson-NL
收藏Hugging Face2025-11-22 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/wahdan2003/LUPerson-NL
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: mit
tags:
- person-reid
- surveillance
- pedestrian-detection
size_categories:
- 1M<n<10M
dataset_info:
features:
- name: image
dtype: image
- name: person_id
dtype: string
- name: original_key
dtype: string
---
## Dataset Summary
**IMPORTANT NOTE** dataset belongs to https://github.com/DengpanFu/LUPerson-NL , the effort here is building it from source and make it available
**LUPerson-NL-Full** is a large-scale dataset designed for **Person Re-Identification (ReID)** and computer vision tasks involving pedestrian analysis. It contains cropped images of individuals captured under various conditions.
which is typically used for pre-training deep learning models to recognize the same individual across different camera views or timeframes also can be user to pretrain an encoder to learn high quality features of humans .
- the "NL" is for noisy labels , there is 2 types of noise in this dataset
- different ids can belong to the same person ( ids 1,2,3 can all belong to the same person)
- not all the images assigned to an id belongs to it (some images of id 1 actually belong to another person of id 10 for example)
## Supported Tasks and Leaderboards
* **Person Re-Identification (ReID):** The primary use case. Matching the `person_id` across different images.
* **Pedestrian Detection/Attribute Recognition:** Analyzing features of pedestrians (though explicit attribute labels are not included in this specific subset, the images can be used for self-supervised learning).
* **Image Classification:** Classifying images based on identity.
## Dataset Structure
### Data Instances
The dataset is massive and is best loaded using `streaming=True`. Each instance represents a single cropped image of a person.
### Data Fields
* `image`: A PIL.Image object containing the cropped pedestrian image.
* `person_id`: A string representing the unique identity ID of the person. Images with the same `person_id` belong to the same individual.
* `original_key`: A unique identifier or filename string from the original source dataset, often encoding video/frame information (e.g., `00004_0017_0008741_00`).
## Usage Example
Because this dataset is large, it is highly recommended to use **Streaming** mode (Lazy Loading) to avoid downloading the entire dataset to disk.
```python
import datasets
import torchvision.transforms as T
from PIL import Image
# 1. Load the dataset in streaming mode
# This establishes a connection without downloading the full data
dataset = datasets.load_dataset("wahdan2003/LUPerson-NL-Full", streaming=True)
# 2. Get the first item using an iterator
first_item = next(iter(dataset['train']))
# 3. Access Metadata
print(f"Person ID: {first_item['person_id']}")
print(f"Original Key: {first_item['original_key']}")
print(f"Image Size: {first_item['image'].size}") # Output: (width, height)
# 4. Transform for PyTorch (Example)
# Define a transform pipeline (Convert to Tensor)
transforms = T.Compose([
T.ToTensor()
])
# Apply transform
image_tensor = transforms(first_item['image'])
print(f"Image Tensor Shape: {image_tensor.shape}") # Output: torch.Size([3, H, W])
print(f"Image Tensor Type: {image_tensor.dtype}") # Output: torch.float32
# 5. Save the image to inspect it
# Note: .show() might fail in some headless Linux environments, so saving is safer.
first_item['image'].save("test_image.jpg")
提供机构:
wahdan2003



