huggan/anime-faces|动漫识别数据集|计算机视觉数据集

hugging_face2022-03-22 更新2024-03-04 收录

动漫识别

计算机视觉

下载链接：

https://hf-mirror.com/datasets/huggan/anime-faces

下载链接

链接失效反馈

资源简介：

--- license: cc0-1.0 --- # Dataset Card for anime-faces ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** https://www.kaggle.com/soumikrakshit/anime-faces - **Repository:** https://www.kaggle.com/soumikrakshit/anime-faces - **Paper:** [Needs More Information] - **Leaderboard:** [Needs More Information] - **Point of Contact:** https://github.com/Mckinsey666 ### Dataset Summary This is a dataset consisting of 21551 anime faces scraped from www.getchu.com, which are then cropped using the anime face detection algorithm in https://github.com/nagadomi/lbpcascade_animeface. All images are resized to 64 * 64 for the sake of convenience. Please also cite the two sources when using this dataset. Some outliers are still present in the dataset: Bad cropping results Some non-human faces. Feel free to contribute to this dataset by adding images of similar quality or adding image labels. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages [Needs More Information] ## Dataset Structure ### Data Instances [Needs More Information] ### Data Fields Has a data folder with png files inside. ### Data Splits Only training set ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information [Needs More Information] ### Citation Information [Needs More Information] --- annotations_creators: - found language_creators: - found languages: - unknown licenses: - unknown multilinguality: - unknown pretty_name: anime-faces size_categories: - unknown source_datasets: - original task_categories: - image-classification task_ids: [] ---

提供机构：

huggan

原始信息汇总

Dataset Card for anime-faces

Dataset Description

Dataset Summary

This dataset contains 21551 anime faces scraped from www.getchu.com and cropped using the anime face detection algorithm from https://github.com/nagadomi/lbpcascade_animeface. All images are resized to 64 * 64 pixels. The dataset includes some outliers such as bad cropping results and non-human faces. Contributions to the dataset are welcome.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

[Needs More Information]

Dataset Structure

Data Instances

[Needs More Information]

Data Fields

The dataset includes a data folder containing PNG files.

Data Splits

The dataset only includes a training set.

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

[Needs More Information]

Citation Information

[Needs More Information]

AI搜集汇总

数据集介绍

构建方式

该数据集的构建基于对动漫网站www.getchu.com的爬虫抓取，共计21551张动漫面部图像。图像经过 anime face detection 算法进行裁剪，并统一调整至64x64像素大小，以便于后续处理。构建过程中，对图像质量进行了初步筛选，但仍存在裁剪效果不佳和非人类面部图像的异常值。

特点

anime-faces数据集的主要特点在于其专注于动漫面部图像的收集，适用于图像分类等视觉识别任务。数据集采用CC0协议，允许无限制使用，但建议在使用时引用相关来源。此外，数据集提供了丰富的样本数量，有助于模型的训练和评估。尽管存在一定比例的异常值，但整体质量较高，为研究提供了良好的基础。

使用方法

用户在使用该数据集时，应首先了解其结构和数据字段，数据集仅包含训练集。数据以png格式的图像文件存储在数据文件夹内。在使用前，用户可能需要对数据进行进一步的清洗和预处理，以排除异常值和提升模型性能。引用数据集时，应遵循CC0协议的要求，并注明数据来源。

背景与挑战

背景概述

anime-faces数据集是一个搜集自www.getchu.com的21551张动漫人脸的集合，通过特定的动漫人脸检测算法进行裁剪，并统一调整至64*64像素大小以便利后续处理。该数据集的创建旨在为动漫人脸相关的图像识别研究提供标准化资源，其创建时间虽不明确，但由数据集的维护者soumikrakshit在Kaggle平台上进行管理。该数据集的问世，为计算机视觉领域中的动漫角色识别、图像风格分析等研究提供了重要支撑，并对相关领域的学术探索和技术发展产生了积极影响。

当前挑战

尽管anime-faces数据集为相关研究领域提供了宝贵的资源，但在数据构建过程中亦面临诸多挑战。首先，数据集中存在不完美的裁剪结果和非人类面孔的图片，这为后续的图像分类和识别任务带来了困难。其次，数据集缺乏详细的标注信息，且未提供测试集，这限制了其在模型评估和公平性分析方面的应用。此外，数据集的多元性和包容性问题亦有待进一步探讨，以增强其在不同文化和性别群体中的代表性。

常用场景

经典使用场景

在计算机视觉研究领域，huggan/anime-faces数据集因其独特的图像内容而被广泛使用。该数据集包含了从www.getchu.com网站抓取的21551张动漫人脸图像，经过特定算法裁剪和尺寸标准化后，成为了面部识别、表情识别等任务的理想资源。

解决学术问题

该数据集解决了传统面部识别研究中对于非真实人脸图像处理的局限性问题，为研究者提供了一个丰富的、专注于动漫风格人脸的样本集。这对于改善算法对于不同人脸风格的理解和识别能力，具有重要的学术价值和深远的研究意义。

衍生相关工作

基于huggan/anime-faces数据集，衍生出了一系列相关的研究工作，包括但不限于动漫人脸的自动标注、特征提取以及跨领域的风格迁移等。这些工作进一步扩展了数据集的应用范围，促进了计算机视觉与动漫文化的融合。

以上内容由AI搜集并总结生成

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

网易云音乐数据集

该数据集包含了网易云音乐平台上的歌手信息、歌曲信息和歌单信息，数据通过爬虫技术获取并整理成CSV格式，用于音乐数据挖掘和推荐系统构建。

github 收录

中国食物成分数据库

食物成分数据比较准确而详细地描述农作物、水产类、畜禽肉类等人类赖以生存的基本食物的品质和营养成分含量。它是一个重要的我国公共卫生数据和营养信息资源，是提供人类基本需求和基本社会保障的先决条件；也是一个国家制定相关法规标准、实施有关营养政策、开展食品贸易和进行营养健康教育的基础，兼具学术、经济、社会等多种价值。本数据集收录了基于2002年食物成分表的1506条食物的31项营养成分（含胆固醇）数据，657条食物的18种氨基酸数据、441条食物的32种脂肪酸数据、130条食物的碘数据、114条食物的大豆异黄酮数据。

国家人口健康科学数据中心收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

CosyVoice 2

CosyVoice 2是由阿里巴巴集团开发的多语言语音合成数据集，旨在通过大规模多语言数据集训练，实现高质量的流式语音合成。数据集通过有限标量量化技术改进语音令牌的利用率，并结合预训练的大型语言模型作为骨干，支持流式和非流式合成。数据集的创建过程包括文本令牌化、监督语义语音令牌化、统一文本-语音语言模型和块感知流匹配模型等步骤。该数据集主要应用于语音合成领域，旨在解决高延迟和低自然度的问题，提供接近人类水平的语音合成质量。

arXiv 收录

中国近海台风路径集合数据集(1945-2024)

1945-2024年度，中国近海台风路径数据集，包含每个台风的真实路径信息、台风强度、气压、中心风速、移动速度、移动方向。数据源为获取温州台风网(http://www.wztf121.com/)的真实观测路径数据，经过处理整合后形成文件，如使用csv文件需使用文本编辑器打开浏览，否则会出现乱码，如要使用excel查看数据，请使用xlsx的格式。

国家海洋科学数据中心收录