awesome-public-datasets
收藏github2016-12-18 更新2024-05-31 收录
下载链接:
https://github.com/wvanamstel/awesome-public-datasets
下载链接
链接失效反馈官方服务:
资源简介:
这是一个收集和整理互联网上大规模公共数据集的仓库,数据集涵盖气候、经济、能源、金融、生物、农业、物理、医疗健康和地理空间等多个领域。
This is a repository dedicated to collecting and organizing large-scale public datasets from the internet. The datasets cover a wide range of fields including climate, economics, energy, finance, biology, agriculture, physics, healthcare, and geospatial data.
创建时间:
2014-12-17
原始信息汇总
数据集概述
气候/天气
- Australian Weather: http://www.bom.gov.au/climate/dwo/
- Canadian Meteorological Centre: https://weather.gc.ca/grib/index_e.html
- Climate Data: http://www.cru.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/
- Global Climate Data Since 1929: http://www.tutiempo.net/en/Climate
- NOAA Bering Sea Climate: http://www.beringclimate.noaa.gov/
- NOAA Climate Datasets: http://ncdc.noaa.gov/data-access/quick-links
- NOAA Realtime Weather Models: http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction
- WU Historical Weather Worldwide: http://www.wunderground.com/history/index.html
经济学
- American Economic Ass. (AEA): http://www.aeaweb.org/RFE/toc.php?show=complete
- EconData (UMD): http://inforumweb.umd.edu/econdata/econdata.html
- Internet Product Code Database: http://www.upcdatabase.com/
- World bank: http://data.worldbank.org/indicator
能源
- AMPds: http://ampds.org/
- BLUEd: http://nilm.cmubi.org/
- COMBED: http://combed.github.io/
- Dataport: https://dataport.pecanstreet.org/
- ECO: http://www.vs.inf.ethz.ch/res/show.html?what=eco-data
- EIA: http://www.eia.gov/electricity/data/eia923/
- iAWE: http://iawe.github.io/
- HFED: http://hfed.github.io/
- Plaid: http://plaidplug.com/
- REDD: http://redd.csail.mit.edu/
- UK-Dale: http://www.doc.ic.ac.uk/~dk3810/data/
金融
- CBOE Futures Exchange: http://cfe.cboe.com/Data/
- Google Finance: https://www.google.com/finance
- Google Trends: http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0
- NASDAQ: https://data.nasdaq.com/
- OANDA: http://www.oanda.com/
- OSU Financial data: http://fisher.osu.edu/fin/osudata.htm or http://fisher.osu.edu/fin/fdf/osudata.htm
- Quandl: http://www.quandl.com/
- St Louis Federal: http://research.stlouisfed.org/fred2/
- Yahoo Finance: http://finance.yahoo.com/
生物学
- CRCNS: http://crcns.org/data-sets
- Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/
- Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php
- MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi
- NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/
- Protein structure: http://www.infobiotic.net/PSPbenchmarks/
- Protein Data Bank: http://pdb.org/
- Public Gene Data: http://www.pubgene.org/
- Stanford Microarray Data: http://smd.stanford.edu/
- UniGene: http://www.ncbi.nlm.nih.gov/unigene
- The Personal Genome Project: http://www.personalgenomes.org/ or https://my.pgp-hms.org/public_genetic_data
- 1000 Genomes: http://www.1000genomes.org/data
- UCSC Public Data: http://hgdownload.soe.ucsc.edu/downloads.html
农业
- U.S. Department of Agricultures PLANTS Database: http://www.plants.usda.gov/dl_all.html
物理学
- NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html
- CERN Open Data Portal: http://opendata.cern.ch/
医疗保健
- EHDP Large Health Data Sets: http://www.ehdp.com/vitalnet/datasets.htm
- Gapminder: http://www.gapminder.org/data/
- Medicare Data File: http://go.cms.gov/19xxPN4
GeoSpace/GIS
- EOSDIS: http://sedac.ciesin.columbia.edu/data/sets/browse
- Factual Global Location Data: http://www.factual.com/
- Geo Spatial Data: http://geodacenter.asu.edu/datalist/
- OpenStreetMap (a free map worldwide): http://wiki.openstreetmap.org/wiki/Downloading_data
- GeoNames (over eight million placenames): http://www.geonames.org/
- BODC (marine data of nearly 22,000 oceanographic vars): http://www.bodc.ac.uk/data/where_to_find_data/
- GADM (Global Administrative Areas database): http://www.gadm.org/
- twofishes (Foursquares coarse geocoder): https://github.com/foursquare/twofishes
- Natural Earth (vectors and rasters of the world): http://www.naturalearthdata.com/
- tz_world (timezone polygons): http://efele.net/maps/tz/world/
- TIGER/Line (official United States boundaries and roads): http://www.census.gov/geo/maps-data/data/tiger-line.html
交通运输
- Airlines Data (2009 ASA Challenge): http://stat-computing.org/dataexpo/2009/the-data.html
- Bike Share Data Systems: https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems
- Edge data for US domestic flights 1990 to 2009: http://data.memect.com/?p=229
- Half a million Hubway rides: http://hubwaydatachallenge.org/trip-history-data/
- NYC Taxi Trip Data 2013 (FOIA/FOIL): https://archive.org/details/nycTaxiTripData2013
- OpenFlights (airport, airline and route data): http://openflights.org/data.html
- RITA Airline On-Time Performance Data: http://www.transtats.bts.gov/Tables.asp?DB_ID=120
- RITA transport data collection: http://www.transtats.bts.gov/DataIndex.asp
- Transport for London: http://www.tfl.gov.uk/info-for/open-data-users/our-feeds
- U.S. Freight Analysis Framework: http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm
- Marine Traffic - ship tracks, port calls and more: https://www.marinetraffic.com/de/p/api-services
政府
- Archive-it: https://www.archive-it.org/explore?show=Collections
- Australia: https://data.gov.au/
- Australia: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument
- Canada: http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1
- Chicago: https://data.cityofchicago.org/
- FDA: https://open.fda.gov/index.html
- Fed Stats: http://www.fedstats.gov/cgi-bin/A2Z.cgi
- Guardian world governments: http://www.guardian.co.uk/world-government-data
- HUD: http://www.huduser.org/portal/datasets/pdrdatas.html
- London Datastore, U.K: http://data.london.gov.uk/dataset
- Glasgow, Scotland, UK: http://data.glasgow.gov.uk/
- Netherlands: https://data.overheid.nl/
- New Zealand: http://www.stats.govt.nz/browse_for_stats.aspx
- NYC betanyc: http://betanyc.us/
- NYC Open Data: http://nycplatform.socrata.com/
- OECD: http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html
- RITA: http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp
- San Francisco Data sets: http://datasf.org/
- The World Bank: http://wdronline.worldbank.org/
- U.K. Government Data: http://data.gov.uk/data
- U.S. Census Bureau: http://www.census.gov/data.html
- U.S. American Community Survey: http://www.census.gov/acs/www/data_documentation/data_release_info/
- U.S. Federal Government Agencies: http://www.data.gov/metric
- U.S. Federal Government Data Catalog: http://catalog.data.gov/dataset
- U.S. Open Government: http://www.data.gov/open-gov/
- UK 2011 Census Open Atlas Project: http://www.alex-singleton.com/2011-census-open-atlas-project/
- United Nations: http://data.un.org/
- US CDC Public Health datasets: http://www.cdc.gov/nchs/data_access/ftp_data.htm
- Open Government Data (OGD) Platform India: http://www.data.gov.in/
体育
- Cricsheet (cricket): http://cricsheet.org/
- Betfair (betting exchange) Event Results: http://data.betfair.com/
- Lahmans Baseball Database: http://www.seanlahman.com/baseball-archive/statistics/
- Retrosheet (baseball): http://www.retrosheet.org/game.htm
- Ergast Formula 1 (API available): http://ergast.com/mrd/db
数据挑战
- Challenges in Machine Learning: http://www.chalearn.org/
- DrivenData Competitions for Social Good: http://www.drivendata.org/
- ICWSM Data Challenge (since 2009): http://icwsm.cs.umbc.edu/
- Kaggle Competition Data: http://www.kaggle.com/
- KDD Cup by Tencent 2012: https://www.kddcup2012.org/
- Netflix Prize: http://www.netflixprize.com/leaderboard
- Yelp Dataset Challenge: http://www.yelp.com/dataset_challenge
- Localytics Data Visualization Challenge: https://github.com/localytics/data-viz-challenge
机器学习
- eBay Online Auctions: http://www.modelingonlineauctions.com/datasets
- IMDb database: http://www.imdb.com/interfaces
- Keel Repository: http://sci2s.ugr.es/keel/datasets.php
- Lending Club Loan Data: https://www.lendingclub.com/info/download-data.action
- Machine Learning Data Set Repository: http://mldata.org/
- Million Song Dataset: http://blog.echonest.com/post/3639160982/million-song-dataset
- More Song Datasets: http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets
- MovieLens Data Sets: http://datahub.io/dataset/movielens
- RDataMining R and Data Mining ebook data: http://www.rdatamining.com/data
- Registered meteorites on Earth: http://www.analyticbridge.com/profiles/blogs/registered-meteorites-that-has-impacted-on-earth-visualized
- SF restaurants dataset: http://missionlocal.org/san-francisco-restaurant-health-inspections/
- UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/
- University of Toronto Delve Datasets: http://www.cs.toronto.edu/~delve/data/datasets.html
- Yahoo Ratings and Classification Data: http://webscope.sandbox.yahoo.com/catalog.php?datatype=r
自然语言
- 40 Million Entities in Context: https://code.google.com/p/wiki-links/downloads/list
- ClueWeb09 FACC: http://lemurproject.org/clueweb09/FACC1/
- ClueWeb12 FACC: http://lemurproject.org/clueweb12/FACC1/
- DBpedia: http://wiki.dbpedia.org/Datasets
- Flickr personal taxonomies: http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html
- Google Books Ngrams: http://aws.amazon.com/datasets/8172056142375670
- Google Web 5gram, 2006 (1T): https://catalog.ldc.upenn.edu/LDC2006T13
- Gutenberg eBooks List: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs
- Hansards: http://www.isi.edu/natural-language/download/hansard/
- Machine Translation: http://statmt.org/wmt11/translation-task.html#download
- SMS Spam Collection: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/
- USENET corpus: http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html
- Wikidata: https://www.wikidata.org/wiki/Wikidata:Database_download
- WordNet: http://wordnet.princeton.edu/wordnet/download/
图像处理
- 2GB of photos of cats: http://137.189.35.203/WebUI/CatDatabase/catData.html
- Face Recognition Benchmark: http://www.face-rec.org/databases/
- ImageNet: http://www.image-net.org/
时间序列
- Time Series data Library: https://datamarket.com/data/list/?q=provider:tsdl
- UC Riverside Time Series: http://www.cs.ucr.edu/~eamonn/time_series_data/
社会科学
- China Hotel Checkin/out data: http://www.360doc.com/content/13/1105/13/7863900_326788919.shtml
- CMU Enron Email: http://www.cs.cmu.edu/~enron/
- Facebook Social Networks (since 2007): http://law.di.unimi.it/datasets.php
- Facebook100 (2005): https://archive.org/details/oxford-2005-facebook-matrix
- Foursquare (2010,2011): http://www.public.asu.edu/~hgao16/dataset.html
- Foursquare (UMN/Sarwat, 2013): https://archive.org/details/201309_foursquare_dataset_umn
- General Social Survey (GSS): http
搜集汇总
数据集介绍

构建方式
本数据集是通过收集和整理来自博客、回答和用户响应等公共数据源构建而成的。其中包含的数据集大多免费,但也有一部分是付费的。数据集的整理来源于GitHub上的awesome-public-datasets项目。
特点
数据集涵盖了多个领域,包括气象、经济、能源、金融、生物学、农业、物理学、健康医疗、地理信息、交通、政府、体育、机器学习等,提供了丰富的公共数据资源。每个领域下都有详细的数据集链接和数据类型介绍,便于用户查找和利用。
使用方法
用户可以通过访问提供的链接直接获取所需数据集。每个数据集都有详细的描述,包括数据来源、格式、大小等信息,方便用户理解和使用。此外,数据集页面还提供了数据使用最佳实践和相关挑战竞赛信息,助力用户更好地应用数据集。
背景与挑战
背景概述
awesome-public-datasets数据集是一份由GitHub用户caesar0301整理的公开数据集列表,收集和整理自博客、回答和用户响应。该数据集涵盖了气候、经济、能源、金融、生物学、农业、物理学、医疗保健、地理信息系统、交通运输、政府、体育、机器学习、自然语言处理、图像处理、时间序列分析、社会科学、复杂网络、计算机网络、博物馆、数据共享平台、公共领域数据等多个领域。该数据集的创建旨在为研究人员提供方便的数据资源,自发布以来,对相关领域的研究产生了积极影响。
当前挑战
在构建awesome-public-datasets数据集时,主要面临的挑战包括:1) 数据集的多样性和质量保证,需要确保每个数据集的来源可靠、格式统一、且具有一定的研究价值;2) 数据集的更新和维护,由于数据集来源广泛,需要持续跟踪和更新以保持其时效性和准确性;3) 数据集的版权和使用权问题,需要确保所有公开的数据集都遵循相应的版权和使用协议,避免侵权问题。
常用场景
经典使用场景
awesome-public-datasets作为公共数据集资源列表,其经典使用场景在于为研究者提供丰富多样的数据源,以便于进行数据分析和挖掘任务。例如,社会科学研究者可以利用其中的社交媒体网络数据来分析用户行为模式,生物信息学家则可以访问基因表达数据以探索生物学现象。
实际应用
在实际应用中,awesome-public-datasets的数据集被广泛应用于市场分析、公共健康、交通规划等多个领域。例如,政府和城市规划者可以利用交通数据集来优化交通流量管理,商业机构可以利用市场数据集来分析消费者行为,从而做出更有效的商业决策。
衍生相关工作
基于awesome-public-datasets的数据集,衍生出了大量相关的研究工作和应用。例如,机器学习领域的学者利用其中的数据集开发了新的算法和模型,而数据可视化专家则利用这些数据集创建了丰富的交互式可视化应用,以帮助公众更好地理解复杂的数据信息。
以上内容由遇见数据集搜集并总结生成



