character_similarity
收藏魔搭社区2025-11-12 更新2024-12-07 收录
下载链接:
https://modelscope.cn/datasets/deepghs/character_similarity
下载链接
链接失效反馈官方服务:
资源简介:
# character_similarity
This is a dataset used for training models to determine whether two anime images (containing only one person) depict the same character. The dataset includes the following versions:
| Version | Filename | Characters | Images | Information |
|:---------:|:-----------------------:|:----------:|:------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| v0 | images_v0.tar.xz | 2059 | 162116 | Crawled from [zerochan.net](https://www.zerochan.net/), includes images of Arknights, Fate/Grand Order, Genshin Impact, Girls' Frontline, and Azur Lane, as well as over 1500 other game or anime characters. The images are all small preview thumbnails. |
| v0_tiny | images_tiny_v0.tar.xz | 514 | 10036 | The dataset `v0` has been simplified by randomly selecting 1/4 of the anime characters and 1/4 of the images for each character. It is recommended to use this dataset for model training and validation. |
| v0_xtiny | images_xtiny_v0.tar.xz | 100 | 1814 | `v0_xtiny` dataset is a further simplified version of `v0_tiny`, with only 100 characters retained. This dataset is only **suitable for model validation and experimentation**, and is not recommended for formal training. |
| v1 | images_v1.tar.xz | 4001 | 292800 | The `v1` dataset is similar to `v0`, crawled from zerochan, but it contains a richer selection of characters and more diverse images of the same character. Each character includes no less than 3 and no more than 200 images. |
| v1_pruned | images_pruned_v1.tar.xz | 3982 | 241483 | Pruned from `v1` dataset, deleted monochrome and non-solo pictures, and also removed pictures where the character accounted for less than 40% of the entire vision area. |
| v2 | images_v2.tar.gz | 37886 | 836217 | Images containing only the faces of all characters from [zerochan.net](https://zerochan.net) have been scraped, and the image size is larger than that of the v1 and v0 datasets. However, please note that the v2 dataset may include multi-level nested image paths, such as `a/b/c/xx.jpg` and `a/b/yy.jpg`. In such cases, `xx.jpg` and `yy.jpg` should not be considered the same character. **Two characters from different images are considered the same character only if they are located in exactly the same path.** |
# 角色相似度(character_similarity)数据集
本数据集用于训练模型,以判断两张仅包含单个人物的动漫图像是否描绘了同一角色。本数据集包含以下版本:
| 版本 | 文件名 | 角色数 | 图像数 | 信息 |
|:-----:|:---------------------:|:-----:|:------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| v0 | images_v0.tar.xz | 2059 | 162116 | 从 [zerochan.net](https://www.zerochan.net/) 爬取,包含明日方舟(Arknights)、命运-冠位指定(Fate/Grand Order)、原神(Genshin Impact)、少女前线(Girls' Frontline)、碧蓝航线(Azur Lane),以及超过1500款其他游戏或动漫角色。所有图像均为小型预览缩略图。 |
| v0_tiny | images_tiny_v0.tar.xz | 514 | 10036 | 对数据集v0进行精简:随机选取1/4的动漫角色,且每个角色选取1/4的图像。本数据集推荐用于模型训练与验证。 |
| v0_xtiny | images_xtiny_v0.tar.xz | 100 | 1814 | 数据集v0_xtiny是v0_tiny的进一步精简版,仅保留100个角色。本数据集仅**适用于模型验证与实验**,不建议用于正式训练。 |
| v1 | images_v1.tar.xz | 4001 | 292800 | 数据集v1与v0类似,均从zerochan.net爬取,但包含更丰富的角色选择与更多样的同一角色图像。每个角色的图像数量不少于3张且不超过200张。 |
|v1_pruned| images_pruned_v1.tar.xz | 3982 | 241483 | 从数据集v1剪枝而来,删除了单色图像与非单人图像,同时移除了角色占整张图像区域比例低于40%的图像。 |
| v2 | images_v2.tar.gz | 37886 | 836217 | 爬取了来自zerochan.net的所有角色的仅包含面部的图像,图像尺寸大于v1与v0数据集。但请注意,v2数据集可能包含多级嵌套的图像路径,例如`a/b/c/xx.jpg`与`a/b/yy.jpg`,此时不应将xx.jpg与yy.jpg视为同一角色。**仅当两张图像的路径完全相同时,来自不同图像的两个角色才被视为同一角色。**
提供机构:
maas
创建时间:
2024-12-03



