nyuuzyou/stickers
收藏Hugging Face2024-01-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/nyuuzyou/stickers
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- image-classification
license: wtfpl
---
# Telegram Stickers Image Classification Dataset
This dataset consists of a collection of Telegram stickers that have been converted into images for the purpose of image classification.
## Dataset Details
- Image Size: 512x512 pixels
- Number of Classes: 1276
- Total Number of Images: 672,911
The dataset was created by extracting stickers from 23,681 sets of stickers in Telegram. Animated and video stickers were removed, and sets that had only one emoji assigned to all stickers were ignored. Stickers that did not fit the 512x512 size were padded with empty pixels. Furthermore, all stickers were converted to the .png format to ensure consistency.
The class names for the stickers were assigned based on the Unicode emoji given to them by the author. For example, the Unicode U+1F917 represents the 🤗 emoji. Each sticker in the dataset is labeled with the corresponding Unicode code as its class.
The name of each image in the dataset corresponds to the file ID of the sticker in Telegram. This unique identifier can be used to reference the original sticker in the Telegram platform.
## Dataset Split
- Training Set:
- Number of Images: 605,043
- Validation Set:
- Number of Images: 33,035
- Test Set:
- Number of Images: 34,833
### Additional Information
The training set `train.zip` has been divided into multiple parts, each of which is approximately 20 GB in size. To extract the dataset, you will need a program that supports extracting split archives, such as 7z.
In the `dataset_resized` folder, you will find the resized version of the dataset. The images in this folder have been resized to 128x128 pixels.
Please note that the original dataset provided is in the format of 512x512-pixel images, while the `dataset_resized` folder contains the resized images of 128x128 pixels.
提供机构:
nyuuzyou
原始信息汇总
Telegram Stickers Image Classification Dataset 概述
数据集基本信息
- 任务类别: 图像分类
- 许可证: WTFPL
数据集详细信息
- 图像尺寸: 512x512像素
- 类别数量: 1276
- 总图像数量: 672,911
数据集创建过程
- 数据集由Telegram中的23,681套贴纸提取而成。
- 动画和视频贴纸被移除,所有贴纸均转换为.png格式。
- 不符合512x512尺寸的贴纸通过填充空白像素进行调整。
- 每个贴纸的类别名称基于作者赋予的Unicode表情符号。
数据集分割
- 训练集:
- 图像数量: 605,043
- 验证集:
- 图像数量: 33,035
- 测试集:
- 图像数量: 34,833
附加信息
- 训练集
train.zip分为多个部分,每个部分约20GB。 dataset_resized文件夹包含尺寸调整为128x128像素的图像。
注意事项
- 原始数据集提供的是512x512像素的图像,而
dataset_resized文件夹中的图像是128x128像素。



