five

GenBank PLN (Plantae, Fungi, Algae) Sequence Index in TSV, CSV, JSONL formats hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e hash://md5/f6f78f64e3b3ff06adc3229badbd578b

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8117719
下载链接
链接失效反馈
官方服务:
资源简介:
GenBank [1] makes sequence records openly available.  This publication contains an index of all accession records from the GenBank PLN division release v256 as seen around 27 June 2023 (2023-06-27) by Preston [2]. PLN division is said to include sequence associated with plants, fungi and algae. included files: 00_gbpln.json.gz - gzipped archive of simple line-json representation of records in gbpln sequence archives. 00_gbpln.sample.json - first 10 lines of line-json representation of records in gbpln sequence archives. 00_gbpln.tsv.gz - gzipped archive of tab-separated values (tsv) representation of records in gbpln sequence archives. 00_gbpln.sample.tsv - first 10 lines of tab-separated values (tsv) representation of records in gbpln sequence archives. 00_gbpln.csv.gz - gzipped archive of comma-separated values (csv) representation of records in gbpln sequence archives. 00_gbpln.sample.csv - first 10 lines of comma-separated values (csv) representation of records in gbpln sequence archives. Also include Preston provenance records in files with 64 long character filenames (e.g., hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e). Please do note that, at time of writing (2023-07-05), the actual GenBank Sequence archives are hosted at Arizona State University Biodiversity Knowledge Integration Center via Preston remote https://biokic6.rc.asu.edu/preston/gbpln . And this ASU remote is currently proxied via https://linker.bio.  Examples: To stream json structure data directly from zenodo and only include records with "OBI" in it: # Stream records from this publication # and print first record containing "OBI" in gbpln.json.gz # using bash, jq, gunzip, head, and curl curl https://zenodo.org/record/8117720/files/00_gbpln.json.gz\ | gunzip\ | grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\ | head -n1\ | jq . with expected result: { "accession": "JF951063", "http://www.w3.org/2000/01/rdf-schema#seeAlso": "https://ncbi.nlm.nih.gov/nuccore/JF951063", "definition": "Phalaris californica isolate CAL1ITS 5.8S ribosomal RNA gene and internal transcribed spacer 2, partial sequence.", "organism": "Phalaris californica", "specimen_voucher": "D. Keil s.n. (OBI)", "db_xref": "taxon:1108036", "country": "USA", "http://www.w3.org/ns/prov#wasDerivedFrom": "line:gz:hash://sha256/80f3e67d9a954cc8ca7223a10d1951c1ff84ca2844e7840bcb32eeac61181964!/L1400532-L1400572", "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "genbank-flatfile" } Similar example, but using csv : # Stream records from this publication # and print first record containing "OBI" in 00_gbpln.csv.gz # using bash, jq, gunzip, head, and curl curl https://zenodo.org/record/8117720/files/00_gbpln.csv.gz\ | gunzip\ | grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\ | head -n1 with expected results: JF951063,https://ncbi.nlm.nih.gov/nuccore/JF951063,"Phalaris californica isolate CAL1ITS 5.8S ribosomal RNA gene and internal transcribed spacer 2, partial sequence.",taxon:1108036,Phalaris californica,USA,null,D. Keil s.n. (OBI),null,line:gz:hash://sha256/80f3e67d9a954cc8ca7223a10d1951c1ff84ca2844e7840bcb32eeac61181964!/L1400532-L1400572 with header, extracted using: curl https://zenodo.org/record/8117720/files/00_gbpln.csv.gz\ | gunzip\ | head -n1   accession,rdfs:seeAlso,definition,db_xref,organism,country,host,specimen_voucher,isolation_source,prov:wasDerivedFrom The same results can be obtained using Preston, a biodiversity dataset tracker: preston ls\ --anchor hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e\ --remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\ | grep urn:x-ncbi:gbpln.csv.gz\ | head -n1\ | preston cat\ --remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\ | gunzip\ | grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\ | head -n1 References  [1] Sayers E, Cavanaugh M, Clark K, Ostell J, Pruitt K, Karsch-Mizrachi I, "GenBank", Nucleic Acids Research, Volume 47, Issue D1, January 2019, pp. D94-D99 PMID:30365038 PMCID:PMC6323954 DOI:10.1093/nar/gky989 [2] Elliott, M.J., Poelen, J.H. & Fortes, J.A.B. Signing data citations enables data verification and citation persistence. Sci Data 10, 419 (2023). doi:10.1038/s41597-023-02230-y hash://sha256/f849c870565f608899f183ca261365dce9c9f1c5441b1c779e0db49df9c2a19d PS To clone all data (including >200GB source data): preston clone\ --remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\ --anchor hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e
创建时间:
2023-07-06
二维码
社区交流群
二维码
科研交流群
商业服务