GenBank PLN (Plantae, Fungi, Algae) Sequence Index in TSV, CSV, JSONL formats hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e hash://md5/f6f78f64e3b3ff06adc3229badbd578b
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8117719
下载链接
链接失效反馈官方服务:
资源简介:
GenBank [1] makes sequence records openly available.
This publication contains an index of all accession records from the GenBank PLN division release v256 as seen around 27 June 2023 (2023-06-27) by Preston [2]. PLN division is said to include sequence associated with plants, fungi and algae.
included files:
00_gbpln.json.gz - gzipped archive of simple line-json representation of records in gbpln sequence archives.
00_gbpln.sample.json - first 10 lines of line-json representation of records in gbpln sequence archives.
00_gbpln.tsv.gz - gzipped archive of tab-separated values (tsv) representation of records in gbpln sequence archives.
00_gbpln.sample.tsv - first 10 lines of tab-separated values (tsv) representation of records in gbpln sequence archives.
00_gbpln.csv.gz - gzipped archive of comma-separated values (csv) representation of records in gbpln sequence archives.
00_gbpln.sample.csv - first 10 lines of comma-separated values (csv) representation of records in gbpln sequence archives.
Also include Preston provenance records in files with 64 long character filenames (e.g., hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e).
Please do note that, at time of writing (2023-07-05), the actual GenBank Sequence archives are hosted at Arizona State University Biodiversity Knowledge Integration Center via Preston remote https://biokic6.rc.asu.edu/preston/gbpln . And this ASU remote is currently proxied via https://linker.bio.
Examples:
To stream json structure data directly from zenodo and only include records with "OBI" in it:
# Stream records from this publication
# and print first record containing "OBI" in gbpln.json.gz
# using bash, jq, gunzip, head, and curl
curl https://zenodo.org/record/8117720/files/00_gbpln.json.gz\
| gunzip\
| grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\
| head -n1\
| jq .
with expected result:
{
"accession": "JF951063",
"http://www.w3.org/2000/01/rdf-schema#seeAlso": "https://ncbi.nlm.nih.gov/nuccore/JF951063",
"definition": "Phalaris californica isolate CAL1ITS 5.8S ribosomal RNA gene and internal transcribed spacer 2, partial sequence.",
"organism": "Phalaris californica",
"specimen_voucher": "D. Keil s.n. (OBI)",
"db_xref": "taxon:1108036",
"country": "USA",
"http://www.w3.org/ns/prov#wasDerivedFrom": "line:gz:hash://sha256/80f3e67d9a954cc8ca7223a10d1951c1ff84ca2844e7840bcb32eeac61181964!/L1400532-L1400572",
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "genbank-flatfile"
}
Similar example, but using csv :
# Stream records from this publication
# and print first record containing "OBI" in 00_gbpln.csv.gz
# using bash, jq, gunzip, head, and curl
curl https://zenodo.org/record/8117720/files/00_gbpln.csv.gz\
| gunzip\
| grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\
| head -n1
with expected results:
JF951063,https://ncbi.nlm.nih.gov/nuccore/JF951063,"Phalaris californica isolate CAL1ITS 5.8S ribosomal RNA gene and internal transcribed spacer 2, partial sequence.",taxon:1108036,Phalaris californica,USA,null,D. Keil s.n. (OBI),null,line:gz:hash://sha256/80f3e67d9a954cc8ca7223a10d1951c1ff84ca2844e7840bcb32eeac61181964!/L1400532-L1400572
with header, extracted using:
curl https://zenodo.org/record/8117720/files/00_gbpln.csv.gz\
| gunzip\
| head -n1
accession,rdfs:seeAlso,definition,db_xref,organism,country,host,specimen_voucher,isolation_source,prov:wasDerivedFrom
The same results can be obtained using Preston, a biodiversity dataset tracker:
preston ls\
--anchor hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e\
--remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\
| grep urn:x-ncbi:gbpln.csv.gz\
| head -n1\
| preston cat\
--remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\
| gunzip\
| grep -E "[^a-zA-Z]OBI[^a-zA-Z]"\
| head -n1
References
[1] Sayers E, Cavanaugh M, Clark K, Ostell J, Pruitt K, Karsch-Mizrachi I, "GenBank", Nucleic Acids Research, Volume 47, Issue D1, January 2019, pp. D94-D99 PMID:30365038 PMCID:PMC6323954 DOI:10.1093/nar/gky989
[2] Elliott, M.J., Poelen, J.H. & Fortes, J.A.B. Signing data citations enables data verification and citation persistence. Sci Data 10, 419 (2023). doi:10.1038/s41597-023-02230-y hash://sha256/f849c870565f608899f183ca261365dce9c9f1c5441b1c779e0db49df9c2a19d
PS To clone all data (including >200GB source data):
preston clone\
--remote https://linker.bio,https://zenodo.org/record/8117720/files/,https://biokic6.rc.asu.edu/preston/gbpln\
--anchor hash://sha256/bc7368469e50020ce8ae27b9d6a9a869e0b9a2a0a9b5480c69ce6751fa4b870e
创建时间:
2023-07-06



