Qqcf16426/mangaupdates

Name: Qqcf16426/mangaupdates
Creator: Qqcf16426
Published: 2023-12-16 19:10:09
License: 暂无描述

Hugging Face2023-12-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Qqcf16426/mangaupdates

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en tags: - manga - tags - genres - scraped size_categories: - 100K<n<1M --- I scraped [mangaupdates](https://www.mangaupdates.com) for a project and i am sharing the data. There is a tar file which contians the json response from every infos entry. I parsed it and added it to a postgres database. The pgdump was uploaded too. There are some entries that do not exist anymore. It can be found in the removed ids json. <details> <summary>SQL structure</summary> I didnt try to make it a optimal strucure, but i tried to remove the redundancy of strings. ### Info ```sql create table info ( id serial primary key, private_id int, public_id bigint not null, forum_id bigint not null, url_key text not null, url_name text, titles text[] not null, description text, image_name text, typ int not null, year int, latest_chapter integer not null, rating integer not null, bayesian_rating float, genres int[] not null, tags int[] not null, tags_upvotes int[] not null, tags_downvotes int[] not null, tags_uploader bigint[] not null, status text, licensed boolean not null, completed boolean not null, author int[] not null, artist int[] not null, publisher_original int[] not null, publisher_english int[] not null, publication text[] not null, publication_publisher int[] not null, relations text[] not null, anime_start text, anime_end text, last_updated_mu TIMESTAMP, last_updated TIMESTAMP not null, created TIMESTAMP not null ); ``` ### Types ```sql create table if not exists mtypes ( id serial primary key, name text not null ); ``` ### Genres ```sql create table if not exists genres ( id serial primary key, name text not null ); ``` ### Tags ```sql create table if not exists tags ( id serial primary key, name text not null ); ``` ### People ```sql create table if not exists ppl ( id serial primary key, mu_id bigint, name text not null ); ``` </details>

提供机构：

Qqcf16426

原始信息汇总

数据集概述

数据来源

数据来源于mangaupdates网站的爬取。

数据内容

包含一个tar文件，内含从每个信息条目获取的JSON响应。
数据已被解析并添加到PostgreSQL数据库中，pgdump文件也已上传。
存在一些已删除的条目，这些条目的ID记录在removed ids json文件中。

数据库结构

数据库结构非最优化，但尽量减少了字符串冗余。

表结构

Info表
- 字段包括：id, private_id, public_id, forum_id, url_key, url_name, titles, description, image_name, typ, year, latest_chapter, rating, bayesian_rating, genres, tags, tags_upvotes, tags_downvotes, tags_uploader, status, licensed, completed, author, artist, publisher_original, publisher_english, publication, publication_publisher, relations, anime_start, anime_end, last_updated_mu, last_updated, created。
Types表
- 字段包括：id, name。
Genres表
- 字段包括：id, name。
Tags表
- 字段包括：id, name。
People表
- 字段包括：id, mu_id, name。

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是从MangaUpdates网站抓取的漫画信息集合，包含约10万至100万条记录，涵盖漫画标题、作者、类型、标签和评分等结构化数据。数据已解析并存储在PostgreSQL数据库中，但存在生成错误，可能导致部分数据不完整或格式异常，适用于漫画内容分析和推荐系统研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集