P1ayer-1/college_texts_metadata
收藏Hugging Face2024-01-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/P1ayer-1/college_texts_metadata
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: authors
dtype: string
- name: color
sequence: float64
- name: depth
dtype: int64
- name: field
dtype: string
- name: id
dtype: int64
- name: match_count
dtype: int64
- name: position
sequence: float64
- name: title
dtype: string
- name: hits
list:
- name: _id
dtype: string
- name: _index
dtype: string
- name: _score
dtype: float64
- name: _source
struct:
- name: aa_lgli_comics_2022_08_file
dtype: string
- name: aac_zlib3_book
dtype: string
- name: file_unified_data
struct:
- name: author_additional
sequence: 'null'
- name: author_best
dtype: string
- name: classifications_unified
struct:
- name: ddc
sequence: string
- name: lcc
sequence: string
- name: library_and_archives_canada_cataloguing_in_publication
sequence: string
- name: nur
sequence: string
- name: udc
sequence: string
- name: comments_additional
sequence: 'null'
- name: comments_best
dtype: string
- name: content_type
dtype: string
- name: cover_url_additional
sequence: 'null'
- name: cover_url_best
dtype: string
- name: edition_varia_additional
sequence: 'null'
- name: edition_varia_best
dtype: string
- name: extension_additional
sequence: 'null'
- name: extension_best
dtype: string
- name: filesize_additional
sequence: 'null'
- name: filesize_best
dtype: int64
- name: has_aa_downloads
dtype: int64
- name: has_aa_exclusive_downloads
dtype: int64
- name: identifiers_unified
struct:
- name: abaa
sequence: string
- name: abebooks.de
sequence: string
- name: abwa_bibliographic_number
sequence: string
- name: alibris
sequence: string
- name: alibris_id
sequence: string
- name: asin
sequence: string
- name: bayerische_staatsbibliothek
sequence: string
- name: bcid
sequence: string
- name: better_world_books
sequence: string
- name: bhl
sequence: string
- name: bibliothèque_nationale_de_france
sequence: string
- name: bibsys
sequence: string
- name: bl
sequence: string
- name: bnb
sequence: string
- name: bodleian,_oxford_university
sequence: string
- name: booklocker.com
sequence: string
- name: bookmooch
sequence: string
- name: booksforyou
sequence: string
- name: bookwire
sequence: string
- name: boston_public_library
sequence: string
- name: canadian_national_library_archive
sequence: string
- name: choosebooks
sequence: string
- name: cornell_university_library
sequence: string
- name: cornell_university_online_library
sequence: string
- name: dc_books
sequence: string
- name: depósito_legal
sequence: string
- name: digital_library_pomerania
sequence: string
- name: discovereads
sequence: string
- name: dnb
sequence: string
- name: dominican_institute_for_oriental_studies_library
sequence: string
- name: etsc
sequence: string
- name: fennica
sequence: string
- name: finnish_public_libraries_classification_system
sequence: string
- name: folio
sequence: string
- name: freebase
sequence: string
- name: gbook
sequence: string
- name: goethe_university_library,_frankfurt
sequence: string
- name: goodreads
sequence: string
- name: grand_comics_database
sequence: string
- name: harvard
sequence: string
- name: hathi_trust
sequence: string
- name: identificativo_sbn
sequence: string
- name: ilmiolibro
sequence: string
- name: inducks
sequence: string
- name: isbn10
sequence: string
- name: isbn13
sequence: string
- name: isfdbpubideditions
sequence: string
- name: issn
sequence: string
- name: istc
sequence: string
- name: lccn
sequence: string
- name: learnawesome
sequence: string
- name: library_and_archives_canada_cataloguing_in_publication
sequence: string
- name: librarything
sequence: string
- name: libris
sequence: string
- name: librivox
sequence: string
- name: lulu
sequence: string
- name: magcloud
sequence: string
- name: nbuv
sequence: string
- name: ndl
sequence: string
- name: nla
sequence: string
- name: nur
sequence: string
- name: ocaid
sequence: string
- name: oclc
sequence: string
- name: ol
sequence: string
- name: openstax
sequence: string
- name: overdrive
sequence: string
- name: paperback_swap
sequence: string
- name: project_gutenberg
sequence: string
- name: publishamerica
sequence: string
- name: rvk
sequence: string
- name: scribd
sequence: string
- name: shelfari
sequence: string
- name: siso
sequence: string
- name: smashwords_book_download
sequence: string
- name: standard_ebooks
sequence: string
- name: storygraph
sequence: string
- name: ulrls
sequence: string
- name: ulrls_classmark
sequence: string
- name: w._w._norton
sequence: string
- name: wikidata
sequence: string
- name: wikisource
sequence: string
- name: yakaboo
sequence: string
- name: zdb-id
sequence: string
- name: language_codes
sequence: string
- name: most_likely_language_code
dtype: string
- name: original_filename_additional
sequence: 'null'
- name: original_filename_best
dtype: string
- name: original_filename_best_name_only
dtype: string
- name: problems
sequence: 'null'
- name: publisher_additional
sequence: 'null'
- name: publisher_best
dtype: string
- name: stripped_description_additional
sequence: 'null'
- name: stripped_description_best
dtype: string
- name: title_additional
sequence: 'null'
- name: title_best
dtype: string
- name: year_additional
sequence: 'null'
- name: year_best
dtype: string
- name: ia_record
dtype: string
- name: id
dtype: string
- name: indexes
sequence: string
- name: ipfs_infos
sequence: 'null'
- name: isbndb
sequence: 'null'
- name: lgli_file
dtype: string
- name: lgrsfic_book
dtype: string
- name: lgrsnf_book
dtype: string
- name: ol
list:
- name: ol_edition
dtype: string
- name: scihub_doi
sequence: 'null'
- name: search_only_fields
struct:
- name: search_access_types
sequence: string
- name: search_content_type
dtype: string
- name: search_doi
sequence: 'null'
- name: search_extension
dtype: string
- name: search_filesize
dtype: int64
- name: search_isbn13
sequence: string
- name: search_most_likely_language_code
dtype: string
- name: search_record_sources
sequence: string
- name: search_score_base
dtype: float64
- name: search_score_base_rank
dtype: float64
- name: search_text
dtype: string
- name: search_year
dtype: string
- name: zlib_book
dtype: string
splits:
- name: train
num_bytes: 2050799295
num_examples: 565533
download_size: 354984240
dataset_size: 2050799295
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
P1ayer-1
原始信息汇总
数据集信息
特征
- authors: 字符串类型
- color: 浮点数序列类型
- depth: 64位整数类型
- field: 字符串类型
- id: 64位整数类型
- match_count: 64位整数类型
- position: 浮点数序列类型
- title: 字符串类型
- hits: 列表类型,包含以下字段:
- _id: 字符串类型
- _index: 字符串类型
- _score: 64位浮点数类型
- _source: 结构体类型,包含以下字段:
- aa_lgli_comics_2022_08_file: 字符串类型
- aac_zlib3_book: 字符串类型
- file_unified_data: 结构体类型,包含以下字段:
- author_additional: 空序列类型
- author_best: 字符串类型
- classifications_unified: 结构体类型,包含以下字段:
- ddc: 字符串序列类型
- lcc: 字符串序列类型
- library_and_archives_canada_cataloguing_in_publication: 字符串序列类型
- nur: 字符串序列类型
- udc: 字符串序列类型
- comments_additional: 空序列类型
- comments_best: 字符串类型
- content_type: 字符串类型
- cover_url_additional: 空序列类型
- cover_url_best: 字符串类型
- edition_varia_additional: 空序列类型
- edition_varia_best: 字符串类型
- extension_additional: 空序列类型
- extension_best: 字符串类型
- filesize_additional: 空序列类型
- filesize_best: 64位整数类型
- has_aa_downloads: 64位整数类型
- has_aa_exclusive_downloads: 64位整数类型
- identifiers_unified: 结构体类型,包含以下字段:
- abaa: 字符串序列类型
- abebooks.de: 字符串序列类型
- abwa_bibliographic_number: 字符串序列类型
- alibris: 字符串序列类型
- alibris_id: 字符串序列类型
- asin: 字符串序列类型
- bayerische_staatsbibliothek: 字符串序列类型
- bcid: 字符串序列类型
- better_world_books: 字符串序列类型
- bhl: 字符串序列类型
- bibliothèque_nationale_de_france: 字符串序列类型
- bibsys: 字符串序列类型
- bl: 字符串序列类型
- bnb: 字符串序列类型
- bodleian,_oxford_university: 字符串序列类型
- booklocker.com: 字符串序列类型
- bookmooch: 字符串序列类型
- booksforyou: 字符串序列类型
- bookwire: 字符串序列类型
- boston_public_library: 字符串序列类型
- canadian_national_library_archive: 字符串序列类型
- choosebooks: 字符串序列类型
- cornell_university_library: 字符串序列类型
- cornell_university_online_library: 字符串序列类型
- dc_books: 字符串序列类型
- depósito_legal: 字符串序列类型
- digital_library_pomerania: 字符串序列类型
- discovereads: 字符串序列类型
- dnb: 字符串序列类型
- dominican_institute_for_oriental_studies_library: 字符串序列类型
- etsc: 字符串序列类型
- fennica: 字符串序列类型
- finnish_public_libraries_classification_system: 字符串序列类型
- folio: 字符串序列类型
- freebase: 字符串序列类型
- gbook: 字符串序列类型
- goethe_university_library,_frankfurt: 字符串序列类型
- goodreads: 字符串序列类型
- grand_comics_database: 字符串序列类型
- harvard: 字符串序列类型
- hathi_trust: 字符串序列类型
- identificativo_sbn: 字符串序列类型
- ilmiolibro: 字符串序列类型
- inducks: 字符串序列类型
- isbn10: 字符串序列类型
- isbn13: 字符串序列类型
- isfdbpubideditions: 字符串序列类型
- issn: 字符串序列类型
- istc: 字符串序列类型
- lccn: 字符串序列类型
- learnawesome: 字符串序列类型
- library_and_archives_canada_cataloguing_in_publication: 字符串序列类型
- librarything: 字符串序列类型
- libris: 字符串序列类型
- librivox: 字符串序列类型
- lulu: 字符串序列类型
- magcloud: 字符串序列类型
- nbuv: 字符串序列类型
- ndl: 字符串序列类型
- nla: 字符串序列类型
- nur: 字符串序列类型
- ocaid: 字符串序列类型
- oclc: 字符串序列类型
- ol: 字符串序列类型
- openstax: 字符串序列类型
- overdrive: 字符串序列类型
- paperback_swap: 字符串序列类型
- project_gutenberg: 字符串序列类型
- publishamerica: 字符串序列类型
- rvk: 字符串序列类型
- scribd: 字符串序列类型
- shelfari: 字符串序列类型
- siso: 字符串序列类型
- smashwords_book_download: 字符串序列类型
- standard_ebooks: 字符串序列类型
- storygraph: 字符串序列类型
- ulrls: 字符串序列类型
- ulrls_classmark: 字符串序列类型
- w._w._norton: 字符串序列类型
- wikidata: 字符串序列类型
- wikisource: 字符串序列类型
- yakaboo: 字符串序列类型
- zdb-id: 字符串序列类型
- language_codes: 字符串序列类型
- most_likely_language_code: 字符串类型
- original_filename_additional: 空序列类型
- original_filename_best: 字符串类型
- original_filename_best_name_only: 字符串类型
- problems: 空序列类型
- publisher_additional: 空序列类型
- publisher_best: 字符串类型
- stripped_description_additional: 空序列类型
- stripped_description_best: 字符串类型
- title_additional: 空序列类型
- title_best: 字符串类型
- year_additional: 空序列类型
- year_best: 字符串类型
- ia_record: 字符串类型
- id: 字符串类型
- indexes: 字符串序列类型
- ipfs_infos: 空序列类型
- isbndb: 空序列类型
- lgli_file: 字符串类型
- lgrsfic_book: 字符串类型
- lgrsnf_book: 字符串类型
- ol: 列表类型,包含以下字段:
- ol_edition: 字符串类型
- scihub_doi: 空序列类型
- search_only_fields: 结构体类型,包含以下字段:
- search_access_types: 字符串序列类型
- search_content_type: 字符串类型
- search_doi: 空序列类型
- search_extension: 字符串类型
- search_filesize: 64位整数类型
- search_isbn13: 字符串序列类型
- search_most_likely_language_code: 字符串类型
- search_record_sources: 字符串序列类型
- search_score_base: 64位浮点数类型
- search_score_base_rank: 64位浮点数类型
- search_text: 字符串类型
- search_year: 字符串类型
- zlib_book: 字符串类型
数据分割
- train: 包含565533个样本,总字节数为2050799295
数据集大小
- 下载大小: 354984240字节
- 数据集大小: 2050799295字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- data_files:



