资源简介:
About: The PubMed dataset contains more than 33 million citations and abstracts of biomedical literature. MEDLINE is the largest subset of PubMed. MEDLINE is the National Library of Medicine's (NLM) premier bibliographic database that contains more than 28 million references to journal articles in life sciences with a concentration on biomedicine.
Once a year, NLM releases a complete (baseline) set of PubMed citation records in XML format for download, which have been ingested in this listing.
- We have taken these XML files from source through our automated pipeline
- Next we have developed a custom parser that converts into a Table format
- Finally it is loaded in Element Data's custom warehouse
* Incremental are also available at the source, which are updated files are then released daily and include new, revised, and deleted citations. The PubMed DTD states any changes to the structure and allowed elements from year to year.
Source:
The data is sourced from NCBI FTP server.
Key Table and its fields:
1 Table
23 Columns, as follows
Key Column Names:
- pmid : PubMed ID
- pmc : PubMed Central ID
- other_id : Other IDs found, each separated by ;
- title : title of the article
- country : Country extracted from journal
- journal : journal of the given paper
- pubdate : Publication date. Defaults to year information only.
- medline_ta : this is abbreviation of the journal name
- nlm_unique_id : NLM unique identification
- reference : string of PMID each separated by ; or list of references made to the article
- delete : boolean if False means paper got updated so you might have two
- languages : list of languages, separated by
- vernacular_title: vernacular title. Defaults to empty string whenever non-available.
- abstract : abstract of the article
- authors : authors, each separated by ;
- mesh_terms : list of MeSH terms with corresponding MeSH ID
- publication_types : list of publication type list each separated by ;
- keywords : list of keywords, each separated by