five

Subject indexing data of K10plus library union catalog

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/6810555
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a an extract of K10plus library union catalog with its subject indexing data: kxp-subjects-sample_2022-06-30.dat : a random sample fo 10.000 records kxp-subjects_2022-06-30_??of10.dat : the full data (47.686.063 records) split in files of up to 5.000.000 records each K10plus is a union catalog of German libraries, run by library service centers BSZ and VZG since 2019. The catalog contains bibliographic data of the majority of academic libraries in Germany. The core data of K10plus is made available as OpenData via APIs and in form of database dumps. More information can be found here: K10plus homepage (in German) K10plus Open Data page (in German) Traditional search interface (OPAC) Data format The data is provided in its raw internal format called PICA+ to not loose information during conversion. In particular the data is given in PICA Normalized Format with one record per line. Each record consists of a list of fields and each field consists of a list of subfields. The data can best be processed with command line tools pica-rs or picadata. A detailled description of PICA format and its processing is given in the German textbook Einführung in die Verarbeitung von PICA-Daten. For visual inspection PICA Normalized Format is best converted into PICA Plain Format (pica-rs command pica print). The following example record contains seven fields: 003@ $0010003231 013D $9104450460$VTsvz$3209786884$7gnd/4151278-9$aEinführung 044K $9106080474$VTsv1$7gnd/4077343-7$3209204761$aSekte 044N $aReligionsgemeinschaft 045E $a12 045F $a291 045Q/01 $9181570408$VTkv$a11.97$jNeue religiöse Bewegungen$jSekten 045R $91270641751$VTkv$7rvk/11410:$3200641751$aBG 9600$jAllgemeines$NB$JTheologie und Religionswissenschaften$NBG$JFundamentaltheologie$NBG 9020-BG 9790$JKirche und Kirchen$NBG 9600-BG 9720$JFreikirchen und Sekten 045V $a1 Each K10plus record is uniquely identified by its record identifier PPN, given in field 003@ subfield $0. The PPN can be used: to link into K10plus catalog, e.g. https://opac.k10plus.de/DB=2.299/PPNSET?PPN=010003231 to retrieve the record in other formats via API, e.g. https://unapi.k10plus.de/?id=opac-de-627:ppn:010003231&format=marcxml (MARC/XML format) and https://ws.gbv.de/suggest/csl/?query=pica.ppn=010003231&citationstyle=ieee&language=de (Citation Format) Scope of the data The data is limited to records having a least one holding by a library participating in K10plus. Records are provided with “offline expansion” (some subfield have been added automatically to facilitate re-use of the data) and limited to the following fields: 003@ with internal record identifier “PPN” in subfield $0 013D type of content 013F target audience 041A keywords 044. all subject indexing fields starting with 044 045. all subject indexing fields starting with 045 144Z local library keywords 145S local library classification 145Z local library classification Documentation of the fields can be found at https://format.k10plus.de/k10plushelp.pl?cmd=pplist&katalog=Standard#titel The current dump contains 47.686.063 records with subject indexing out of 74.127.563 K10plus records in total. For reference, the dump has been created and split from a full dump of K10plus with script extract.sh. Processing examples Extract CSV file of PPN and RVK-Notation: pica filter '045R?' kxp-subjects_2022-06-30.dat | pica select '003@$0,045Ra' Get a list of PPN of records having RVK but not BK: pica filter '045R? & !045Q/01' kxp-subjects_2022-06-30.dat | pica select '003@$0' See https://github.com/gbv/k10plus-subjects#readme for additional examples of data analysis. Automatic download Given the Zenodo Record ID (e.g. 6810556), a list of all files can be generated with curl and jq: curl -sL https://zenodo.org/api/records/$ID | jq -r '.files|map([.key,.links.self]|@tsv)[]' Changes 2022-06-30: update with additional fields 013D and 013F (47.686.064 records) 2021-06-30: first published dump (41.786.820 records) License https://creativecommons.org/publicdomain/zero/1.0/
创建时间:
2022-11-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作