awesome-public-datasets
收藏github2017-05-10 更新2024-05-31 收录
下载链接:
https://github.com/HashirZahir/awesome-public-datasets
下载链接
链接失效反馈官方服务:
资源简介:
一个包含高质量公开数据集的列表,这些数据集来自公共领域,持续更新中。
A list of high-quality public datasets sourced from the public domain, continuously updated.
创建时间:
2016-02-09
原始信息汇总
数据集概述
农业
U.S. Department of Agricultures PLANTS Database <http://www.plants.usda.gov/dl_all.html>_
生物学
1000 Genomes <http://www.1000genomes.org/data>_American Gut (Microbiome Project) <https://github.com/biocore/American-Gut>_Broad Cancer Cell Line Encyclopedia (CCLE) <http://www.broadinstitute.org/ccle/home>_Cell Image Library <http://www.cellimagelibrary.org>_Collaborative Research in Computational Neuroscience (CRCNS) <http://crcns.org/data-sets>_Complete Genomics Public Data <http://www.completegenomics.com/public-data/69-genomes/>_EBI ArrayExpress <http://www.ebi.ac.uk/arrayexpress/>_EBI Protein Data Bank in Europe <http://www.ebi.ac.uk/pdbe/emdb/index.html/>_ENCODE project <https://www.encodeproject.org>_Ensembl Genomes <http://ensemblgenomes.org/info/genomes>_Gene Expression Omnibus (GEO) <http://www.ncbi.nlm.nih.gov/geo/>_Gene Ontology (GO) <http://geneontology.org/page/download-annotations>_Global Biotic Interations (GloBI) <https://github.com/jhpoelen/eol-globi-data/wiki#accessing-species-interaction-data>_Harvard Medical School (HMS) LINCS Project <http://lincs.hms.harvard.edu>_Human Genome Diversity Project <http://www.hagsc.org/hgdp/files.html>_Human Microbiome Project (HMP) <http://www.hmpdacc.org/reference_genomes/reference_genomes.php>_ICOS PSP Benchmark <http://ico2s.org/datasets/psp_benchmark.html>_International HapMap Project <http://hapmap.ncbi.nlm.nih.gov/downloads/index.html.en>_Journal of Cell Biology DataViewer <http://jcb-dataviewer.rupress.org>_MIT Cancer Genomics Data <http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi>_NCBI Proteins <http://www.ncbi.nlm.nih.gov/guide/proteins/#databases>_NCBI Taxonomy <http://www.ncbi.nlm.nih.gov/taxonomy>_NeuroData <http://neurodata.io>_NIH Microarray data <http://bit.do/VVW6>_ orFTP <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/>_OpenSNP genotypes data <https://opensnp.org/>_Pathguid - Protein-Protein Interactions Catalog <http://www.pathguide.org/>_Protein Data Bank <http://www.rcsb.org/>_PubChem Project <https://pubchem.ncbi.nlm.nih.gov/>_PubGene (now Coremine Medical) <http://www.pubgene.org/>_Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) <http://cancer.sanger.ac.uk/cosmic>_Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) <http://www.cancerrxgene.org/>_Sequence Read Archive(SRA) <http://www.ncbi.nlm.nih.gov/Traces/sra/>_Stanford Microarray Data <http://smd.stanford.edu/>_Stowers Institute Original Data Repository <http://www.stowers.org/research/publications/odr>_Systems Science of Biological Dynamics (SSBD) Database <http://ssbd.qbic.riken.jp>_Temple University Hospital EEG Database <https://www.nedcdata.org/drupal/node/12>_The Cancer Genome Atlas (TCGA), available via Broad GDAC <https://gdac.broadinstitute.org/>_The Catalogue of Life <http://www.catalogueoflife.org/content/annual-checklist-archive>_The Personal Genome Project <http://www.personalgenomes.org/>_ orPGP <https://my.pgp-hms.org/public_genetic_data>_UCSC Public Data <http://hgdownload.soe.ucsc.edu/downloads.html>_Universal Protein Resource (UnitProt) <http://www.uniprot.org/downloads>_UniGene <http://www.ncbi.nlm.nih.gov/unigene>_
气候/天气
Australian Weather <http://www.bom.gov.au/climate/dwo/>_Brazilian Weather - Historical data (In Portuguese) <http://sinda.crn2.inpe.br/PCD/SITE/novo/site/>_Canadian Meteorological Centre <http://weather.gc.ca/grib/index_e.html>_Climate Data from UEA (updated monthly) <https://crudata.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/>_European Climate Assessment & Dataset <http://eca.knmi.nl/>_Global Climate Data Since 1929 <http://en.tutiempo.net/climate>_NASA Global Imagery Browse Services <https://wiki.earthdata.nasa.gov/display/GIBS>_NOAA Bering Sea Climate <http://www.beringclimate.noaa.gov/>_NOAA Climate Datasets <http://www.ncdc.noaa.gov/data-access/quick-links>_NOAA Realtime Weather Models <http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction>_The World Bank Open Data Resources for Climate Change <http://data.worldbank.org/developers/climate-data-api>_UEA Climatic Research Unit <http://www.cru.uea.ac.uk/data>_WorldClim - Global Climate Data <http://www.worldclim.org>_WU Historical Weather Worldwide <https://www.wunderground.com/history/index.html>_
复杂网络
CrossRef DOI URLs <https://archive.org/details/doi-urls>_DBLP Citation dataset <https://kdl.cs.umass.edu/display/public/DBLP>_NBER Patent Citations <http://nber.org/patents/>_NIST complex networks data collection <http://math.nist.gov/~RPozo/complex_datasets.html>_Protein-protein interaction network <http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm>_PyPI and Maven Dependency Network <https://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/>_Scopus Citation Database <https://www.elsevier.com/solutions/scopus>_Small Network Data <http://www-personal.umich.edu/~mejn/netdata/>_Stanford GraphBase (Steven Skiena) <http://www3.cs.stonybrook.edu/~algorith/implement/graphbase/implement.shtml>_Stanford Large Network Dataset Collection <http://snap.stanford.edu/data/>_Stanford Longitudinal Network Data Sources <http://stanford.edu/group/sonia/dataSources/index.html>_The Koblenz Network Collection <http://konect.uni-koblenz.de/>_The Laboratory for Web Algorithmics (UNIMI) <http://law.di.unimi.it/datasets.php>_The Nexus Network Repository <http://nexus.igraph.org/>_UCI Network Data Repository <https://networkdata.ics.uci.edu/resources.php>_UFL sparse matrix collection <http://www.cise.ufl.edu/research/sparse/matrices/>_WSU Graph Database <http://www.eecs.wsu.edu/mgd/gdb.html>_
计算机网络
3.5B Web Pages from CommonCraw 2012 <http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us>_53.5B Web clicks of 100K users in Indiana Univ. <http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/>_CAIDA Internet Datasets <http://www.caida.org/data/overview/>_ClueWeb09 - 1B web pages <http://lemurproject.org/clueweb09/>_ClueWeb12 - 733M web pages <http://lemurproject.org/clueweb12/>_CommonCrawl Web Data over 7 years <http://commoncrawl.org/the-data/get-started/>_CRAWDAD Wireless datasets from Dartmouth Univ. <https://crawdad.cs.dartmouth.edu/>_Criteo click-through data <http://labs.criteo.com/2015/03/criteo-releases-its-new-dataset/>_Open Mobile Data by MobiPerf <https://console.developers.google.com/storage/openmobiledata_public/>_UCSD Network Telescope, IPv4 /8 net <http://www.caida.org/projects/network_telescope/>_
上下文数据
Context-aware data sets from five domains <http://students.depaul.edu/~yzheng8/DataSets.html#Data>_ orGitHub <https://github.com/irecsys/CARSKit/tree/master/context-aware_data_sets>_
数据挑战
Challenges in Machine Learning <http://www.chalearn.org/>_CrowdANALYTIX dataX <http://data.crowdanalytix.com>_D4D Challenge of Orange <http://www.d4d.orange.com/en/home>_DrivenData Competitions for Social Good <http://www.drivendata.org/>_ICWSM Data Challenge (since 2009) <http://icwsm.cs.umbc.edu/>_Kaggle Competition Data <https://www.kaggle.com/>_KDD Cup by Tencent 2012 <http://www.kddcup2012.org/>_Localytics Data Visualization Challenge <https://github.com/localytics/data-viz-challenge>_Netflix Prize <http://www.netflixprize.com/leaderboard>_Space Apps Challenge <https://2015.spaceappschallenge.org>_Telecom Italia Big Data Challenge <https://dandelion.eu/datamine/open-big-data/>_Yelp Dataset Challenge <http://www.yelp.com/dataset_challenge>_
经济学
American Economic Ass (AEA) <https://www.aeaweb.org/RFE/toc.php?show=complete>_EconData from UMD <http://inforumweb.umd.edu/econdata/econdata.html>_Economic Freedom of the World Data <http://www.freetheworld.com/datasets_efw.html>_Historical MacroEconomic Statistics <http://www.historicalstatistics.org/>_International Trade Statistics <http://www.econistatistics.co.za/>_Internet Product Code Database <http://www.upcdatabase.com/>_Joint External Debt Data Hub <http://www.jedh.org/>_Jon Haveman International Trade Data Links <http://www.macalester.edu/research/economics/PAGE/HAVEMAN/Trade.Resources/TradeData.html>_OpenCorporates Database of Companies in the World <https://opencorporates.com/>_Our World in Data <http://ourworldindata.org/>_SciencesPo World Trade Gravity Datasets <http://econ.sciences-po.fr/thierry-mayer/data>_The Atlas of Economic Complexity <http://atlas.cid.harvard.edu>_The Center for International Data <http://cid.econ.ucdavis.edu>_The Observatory of Economic Complexity <http://atlas.media.mit.edu/en/>_UN Commodity Trade Statistics <http://comtrade.un.org/db/>_UN Human Development Reports <http://hdr.undp.org/en>_
教育
Student Data from Free Code Camp <http://academictorrents.com/details/030b10dad0846b5aecc3905692890fb02404adbf>_
能源
AMPds <http://ampds.org/>_BLUEd <http://nilm.cmubi.org/>_COMBED <http://combed.github.io/>_Dataport <https://dataport.pecanstreet.org/>_ECO <http://www.vs.inf.ethz.ch/res/show.html?what=eco-data>_EIA <http://www.eia.gov/electricity/data/eia923/>_HFED <http://hfed.github.io/>_iAWE <http://iawe.github.io/>_Plaid <http://plaidplug.com/>_REDD <http://redd.csail.mit.edu/>_UK-Dale <http://www.doc.ic.ac.uk/~dk3810/data/>_
金融
CBOE Futures Exchange <http://cfe.cboe.com/Data/>_Google Finance <https://www.google.com/finance>_Google Trends <http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0>_NASDAQ <https://data.nasdaq.com/>_OANDA <http://www.oanda.com/>_OSU Financial data <http://fisher.osu.edu/fin/fdf/osudata.htm>_Quandl <https://www.quandl.com/>_St Louis Federal <https://research.stlouisfed.org/fred2/>_Yahoo Finance <http://finance.yahoo.com/>_
地质学
Earth Models <http://www.earthmodels.org/>_Smithsonian Institution Global Volcano and Eruption Database <http://volcano.si.edu/>_USGS Earthquake Archives <http://earthquake.usgs.gov/earthquakes/search/>_
地理空间/GIS
BODC - marine data of ~22K vars <http://www.bodc.ac.uk/data/where_to_find_data/>_Cambridge, MA, US, GIS data on GitHub <http://cambridgegis.github.io/gisdata.html>_EOSDIS - NASAs earth observing system data <http://sedac.ciesin.columbia.edu/data/sets/browse>_Factual Global Location Data <https://www.factual.com/>_Geo Spatial Data from ASU <http://geodacenter.asu.edu/datalist/>_Geo Wiki Project - Citizen-driven Environmental Monitoring <http://geo-wiki.org/>_GeoFabrik - OSM data extracted to a variety of formats and areas <http://download.geofabrik.de/>_GeoNames Worldwide <http://www.geonames.org/>_Global Administrative Areas Database (GADM) <http://www.gadm.org/>_- `International Institute for Systems Analysis - GIS Datasets <http://www.
搜集汇总
数据集介绍

构建方式
该数据集是通过收集和整理来自博客、回答和用户响应中的公开数据源而构建的。它包含了许多免费的数据集,但也包含一些非免费的数据集。
特点
数据集的特点在于其广泛性,涵盖了从农业、生物学到气候、经济等多个领域的公共数据。它不仅包括了结构化数据,还包括了一些复杂网络和地理信息系统数据。
使用方法
用户可以通过GitHub页面访问数据集的列表,每个数据集都提供了相应的链接,用户可以根据自己的需求下载或访问这些数据集。对于一些特定的数据集,可能需要遵循特定的使用条款或条件。
背景与挑战
背景概述
Awesome Public Datasets是一个收集和整理自博客、回答和用户响应的公共数据集列表。该数据集创建于2016年,由Caesar0301维护,旨在提供一个综合性的资源,涵盖各个领域的公共数据集。数据集列表包含了农业、生物学、气候/天气、复杂网络、计算机网络、上下文数据、数据挑战、经济学、教育、能源、金融、地质学、地理空间/GIS、政府、健康护理等多个领域。它的影响力体现在为研究人员和开发者提供了一个便捷的资源,以支持他们的研究和项目。
当前挑战
尽管Awesome Public Datasets提供了一个丰富的数据集资源,但在构建和使用过程中仍面临一些挑战。首先是数据集的质量和准确性,由于数据来源多样,保证所有数据集的质量和准确性是一个挑战。其次是数据集的更新和维护,随着新数据的不断产生,保持数据集的时效性需要持续的努力。此外,不同领域的数据集在整合和互操作性方面也存在着挑战,这需要进一步的标准化和技术支持。
常用场景
经典使用场景
awesome-public-datasets数据集广泛收集了各领域的公共数据源,其经典使用场景主要包括为研究人员提供丰富的数据资源,以支持他们进行数据分析、挖掘和科学研究。该数据集被广泛应用于学术研究、商业智能分析、政府公开数据等多个领域。
实际应用
在实际应用中,awesome-public-datasets数据集被用于政府公开数据的发布、商业智能分析、教育资源的共享等多个方面,为政策制定、市场分析和教育推广提供了数据支持。
衍生相关工作
基于awesome-public-datasets数据集,衍生出了许多相关的工作,包括但不限于学术论文的发表、商业智能工具的开发、政府数据开放平台的构建等,这些工作进一步推动了数据科学的发展和应用。
以上内容由遇见数据集搜集并总结生成



