five

CountryInfo.txt: Country names, codes, places and leaders

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://doi.org/10.7910/DVN/NBPRDW
下载链接
链接失效反馈
官方服务:
资源简介:
CountryInfo.txt is a general purpose file intended to facilitate natural language processing of news reports and political texts. It was originally developed to identify states for the text filtering system used in the development of MID4, then extended to incorporate CIA World Factbook and WordNet information for the development of TABARI dictionaries. File contains about 32,000 lines, covering about 240 countries and administrative units (e.g. American Samoa, Christmas Island, Hong Kong, Greenland). It is internally documented and almost but not quite XML: The major fields are delimited with tags of the form ... but elements inside are delimited with line feeds. Converting this to strict XML would be a relatively simple programming exercise for anyone who should be working with the file in the first place. File is UTF-8 with Unix line feeds and will need to be converted if used in a Windows system. Fields include Country name in English Adjectival forms and synonyms of the country name, including some non-English versions of the name ISO-3166 numeric, alpha2 and alpha3 codes, FIPS-10 code, IMF code, COW alpha and numeric codes Capital city Cities with populations over 1-million Regions and geographical features (WordNet meronyms) Leaders, 1960-2008 (rulers.org) Members of government, 2003-2010 (CIA World Leaders) The beginning of the file has fairly extensive documentation on the formats used.
创建时间:
2015-05-07
二维码
社区交流群
二维码
科研交流群
商业服务