diff --git a/README.rst b/README.rst index a40998d..3487c63 100644 --- a/README.rst +++ b/README.rst @@ -11,357 +11,357 @@ Other amazingly awesome lists can be found in the Agriculture ------------ -* U.S. Department of Agriculture's PLANTS Database: http://www.plants.usda.gov/dl_all.html +* `U.S. Department of Agriculture's PLANTS Database `_ Biology ------- -* 1000 Genomes: http://www.1000genomes.org/data -* CRCNS: http://crcns.org/data-sets -* Gene Expression Omnibus: http://www.ncbi.nlm.nih.gov/geo/ -* Human Microbiome Project: http://www.hmpdacc.org/reference_genomes/reference_genomes.php -* MIT Cancer Genomics Data: http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi -* NIH Microarray data: ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/series/GSE6532/ -* Protein Data Bank: http://pdb.org/ -* Protein structure: http://www.infobiotic.net/PSPbenchmarks/ -* PubChem Project: https://pubchem.ncbi.nlm.nih.gov/ -* Public Gene Data: http://www.pubgene.org/ -* Stanford Microarray Data: http://smd.stanford.edu/ -* The Personal Genome Project: http://www.personalgenomes.org/ or https://my.pgp-hms.org/public_genetic_data -* UCSC Public Data: http://hgdownload.soe.ucsc.edu/downloads.html -* UniGene: http://www.ncbi.nlm.nih.gov/unigene +* `1000 Genomes `_ +* `Collaborative Research in Computational Neuroscience (CRCNS) `_ +* `Gene Expression Omnibus (GEO) `_ +* `Human Microbiome Project (HMP) `_ +* `ICOS PSP Benchmark `_ +* `MIT Cancer Genomics Data `_ +* `NIH Microarray data `_ +* `Protein Data Bank `_ +* `PubChem Project `_ +* `PubGene (now Coremine Medical) `_ +* `Stanford Microarray Data `_ +* `The Personal Genome Project `_ +* `UCSC Public Data `_ +* `UniGene `_ Climate/Weather --------------- -* Australian Weather: http://www.bom.gov.au/climate/dwo/ -* Canadian Meteorological Centre: https://weather.gc.ca/grib/index_e.html -* Climate Data: http://www.cru.uea.ac.uk/cru/data/temperature/#datter and ftp://ftp.cmdl.noaa.gov/ -* Global Climate Data Since 1929: http://www.tutiempo.net/en/Climate -* NOAA Bering Sea Climate: http://www.beringclimate.noaa.gov/ -* NOAA Climate Datasets: http://ncdc.noaa.gov/data-access/quick-links -* NOAA Realtime Weather Models: http://www.ncdc.noaa.gov/data-access/model-data/model-datasets/numerical-weather-prediction -* WU Historical Weather Worldwide: http://www.wunderground.com/history/index.html +* `Australian Weather `_ +* `Canadian Meteorological Centre `_ +* `Climate Data from UEA (updated at roughly monthly intervals) `_ +* `Global Climate Data Since 1929 `_ +* `NOAA Bering Sea Climate `_ +* `NOAA Climate Datasets `_ +* `NOAA Realtime Weather Models `_ +* `WU Historical Weather Worldwide `_ Complex Networks ---------------- -* CrossRef DOI URLs: https://archive.org/details/doi-urls -* DBLP Citation dataset: https://kdl.cs.umass.edu/display/public/DBLP -* NBER Patent Citations: http://nber.org/patents/ -* NIST complex networks data collection: http://math.nist.gov/~RPozo/complex_datasets.html -* Protein-protein interaction network: http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm -* PyPI and Maven Dependency Network: http://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/ -* Scopus Citation Database: http://www.elsevier.com/online-tools/scopus -* Stanford GraphBase (Steven Skiena): http://www3.cs.stonybrook.edu/~algorith/implement/graphbase/implement.shtml -* Stanford Large Network Dataset Collection: http://snap.stanford.edu/data/ -* The Koblenz Network Collection: http://konect.uni-koblenz.de/ -* The Laboratory for Web Algorithmics (UNIMI): http://law.di.unimi.it/datasets.php -* UCI Network Data Repository: http://networkdata.ics.uci.edu/resources.php -* UFL sparse matrix collection: http://www.cise.ufl.edu/research/sparse/matrices/ -* WSU Graph Database: http://www.eecs.wsu.edu/mgd/gdb.html +* `CrossRef DOI URLs `_ +* `DBLP Citation dataset `_ +* `NBER Patent Citations `_ +* `NIST complex networks data collection `_ +* `Protein-protein interaction network `_ +* `PyPI and Maven Dependency Network `_ +* `Scopus Citation Database `_ +* `Stanford GraphBase (Steven Skiena) `_ +* `Stanford Large Network Dataset Collection `_ +* `The Koblenz Network Collection `_ +* `The Laboratory for Web Algorithmics (UNIMI) `_ +* `UCI Network Data Repository `_ +* `UFL sparse matrix collection `_ +* `WSU Graph Database `_ Computer Networks ----------------- -* 3.5B Web Pages: http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us -* 53.5B Web clicks: http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset -* CAIDA Internet Datasets: http://www.caida.org/data/overview/ -* ClueWeb09: http://lemurproject.org/clueweb09/ -* ClueWeb12: http://lemurproject.org/clueweb12/ -* CommonCrawl Web Data: http://commoncrawl.org/the-data/get-started/ -* Dartmouth CRAWDAD Wireless datasets: http://crawdad.cs.dartmouth.edu/ -* OpenMobileData (MobiPerf): https://console.developers.google.com/storage/openmobiledata_public/ -* UCSD Network Telescope: http://www.caida.org/projects/network_telescope/ +* `3.5B Web Pages `_ +* `53.5B Web clicks `_ +* `CAIDA Internet Datasets `_ +* `ClueWeb09 `_ +* `ClueWeb12 `_ +* `CommonCrawl Web Data `_ +* `Dartmouth CRAWDAD Wireless datasets `_ +* `OpenMobileData (MobiPerf) `_ +* `UCSD Network Telescope `_ Data Challenges --------------- -* Challenges in Machine Learning: http://www.chalearn.org/ -* DrivenData Competitions for Social Good: http://www.drivendata.org/ -* ICWSM Data Challenge (since 2009): http://icwsm.cs.umbc.edu/ -* Kaggle Competition Data: http://www.kaggle.com/ -* KDD Cup by Tencent 2012: https://www.kddcup2012.org/ -* Localytics Data Visualization Challenge: https://github.com/localytics/data-viz-challenge -* Netflix Prize: http://www.netflixprize.com/leaderboard -* Yelp Dataset Challenge: http://www.yelp.com/dataset_challenge +* `Challenges in Machine Learning `_ +* `DrivenData Competitions for Social Good `_ +* `ICWSM Data Challenge (since 2009) `_ +* `Kaggle Competition Data `_ +* `KDD Cup by Tencent 2012 `_ +* `Localytics Data Visualization Challenge `_ +* `Netflix Prize `_ +* `Yelp Dataset Challenge `_ Economics --------- -* American Economic Ass. (AEA): http://www.aeaweb.org/RFE/toc.php?show=complete -* EconData (UMD): http://inforumweb.umd.edu/econdata/econdata.html -* Internet Product Code Database: http://www.upcdatabase.com/ -* World bank: http://data.worldbank.org/indicator +* `American Economic Ass. (AEA) `_ +* `EconData from UMD `_ +* `Internet Product Code Database `_ Energy ------ -* AMPds: http://ampds.org/ -* BLUEd: http://nilm.cmubi.org/ -* COMBED: http://combed.github.io/ -* Dataport: https://dataport.pecanstreet.org/ -* ECO: http://www.vs.inf.ethz.ch/res/show.html?what=eco-data -* EIA: http://www.eia.gov/electricity/data/eia923/ -* HFED: http://hfed.github.io/ -* iAWE: http://iawe.github.io/ -* Plaid: http://plaidplug.com/ -* REDD: http://redd.csail.mit.edu/ -* UK-Dale: http://www.doc.ic.ac.uk/~dk3810/data/ +* `AMPds `_ +* `BLUEd `_ +* `COMBED `_ +* `Dataport `_ +* `ECO `_ +* `EIA `_ +* `HFED `_ +* `iAWE `_ +* `Plaid `_ +* `REDD `_ +* `UK-Dale `_ Finance ------- -* CBOE Futures Exchange: http://cfe.cboe.com/Data/ -* Google Finance: https://www.google.com/finance -* Google Trends: http://www.google.com/trends?q=google&ctab=0&geo=all&date=all&sort=0 -* NASDAQ: https://data.nasdaq.com/ -* OANDA: http://www.oanda.com/ -* OSU Financial data: http://fisher.osu.edu/fin/osudata.htm or http://fisher.osu.edu/fin/fdf/osudata.htm -* Quandl: http://www.quandl.com/ -* St Louis Federal: http://research.stlouisfed.org/fred2/ -* Yahoo Finance: http://finance.yahoo.com/ +* `CBOE Futures Exchange `_ +* `Google Finance `_ +* `Google Trends `_ +* `NASDAQ `_ +* `OANDA `_ +* `OSU Financial data `_ +* `Quandl `_ +* `St Louis Federal `_ +* `Yahoo Finance `_ GeoSpace/GIS ------------ -* BODC (marine data of nearly 22,000 oceanographic vars): http://www.bodc.ac.uk/data/where_to_find_data/ -* EOSDIS: http://sedac.ciesin.columbia.edu/data/sets/browse -* Factual Global Location Data: http://www.factual.com/ -* GADM (Global Administrative Areas database): http://www.gadm.org/ -* Geo Spatial Data: http://geodacenter.asu.edu/datalist/ -* GeoNames (over eight million placenames): http://www.geonames.org/ -* Natural Earth (vectors and rasters of the world): http://www.naturalearthdata.com/ -* OpenStreetMap (a free map worldwide): http://wiki.openstreetmap.org/wiki/Downloading_data -* TIGER/Line (official United States boundaries and roads): http://www.census.gov/geo/maps-data/data/tiger-line.html -* twofishes (Foursquare's coarse geocoder): https://github.com/foursquare/twofishes -* tz_world (timezone polygons): http://efele.net/maps/tz/world/ +* `BODC (marine data of nearly 22,000 oceanographic vars) `_ +* `EOSDIS `_ +* `Factual Global Location Data `_ +* `GADM (Global Administrative Areas database) `_ +* `Geo Spatial Data from ASU `_ +* `GeoNames (over eight million placenames) `_ +* `Natural Earth (vectors and rasters of the world) `_ +* `OpenStreetMap (a free map worldwide) `_ +* `TIGER/Line (official United States boundaries and roads) `_ +* `twofishes (Foursquare's coarse geocoder) `_ +* `tz_world (timezone polygons) `_ Government ---------- -* Archive-it: : https://www.archive-it.org/explore?show=Collections -* Australia: http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3301.02009?OpenDocument -* Australia: https://data.gov.au/ -* Canada: http://www.data.gc.ca/default.asp?lang=En&n=5BCD274E-1 -* Chicago: https://data.cityofchicago.org/ -* EU: http://ec.europa.eu/eurostat/data/database -* FDA: https://open.fda.gov/index.html -* Fed Stats: http://www.fedstats.gov/cgi-bin/A2Z.cgi -* Germany: https://www-genesis.destatis.de/genesis/online -* Glasgow, Scotland, UK: http://data.glasgow.gov.uk/ -* Guardian world governments: http://www.guardian.co.uk/world-government-data -* HUD: http://www.huduser.org/portal/datasets/pdrdatas.html -* London Datastore, U.K: http://data.london.gov.uk/dataset -* Netherlands: https://data.overheid.nl/ -* New Zealand: http://www.stats.govt.nz/browse_for_stats.aspx -* NYC betanyc: http://betanyc.us/ -* NYC Open Data: http://nycplatform.socrata.com/ -* OECD: http://www.oecd.org/document/0,3746,en_2649_201185_46462759_1_1_1_1,00.html -* Open Government Data (OGD) Platform India: http://www.data.gov.in/ -* RITA: http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp -* San Francisco Data sets: http://datasf.org/ -* South Africa: http://beta2.statssa.gov.za/ -* The World Bank: http://wdronline.worldbank.org/ -* U.K. Government Data: http://data.gov.uk/data -* U.S. American Community Survey: http://www.census.gov/acs/www/data_documentation/data_release_info/ -* U.S. Census Bureau: http://www.census.gov/data.html -* U.S. Federal Government Agencies: http://www.data.gov/metric -* U.S. Federal Government Data Catalog: http://catalog.data.gov/dataset -* U.S. Open Government: http://www.data.gov/open-gov/ -* UK 2011 Census Open Atlas Project: http://www.alex-singleton.com/2011-census-open-atlas-project/ -* United Nations: http://data.un.org/ -* US CDC Public Health datasets: http://www.cdc.gov/nchs/data_access/ftp_data.htm +* `Australia `_ +* `Australia `_ +* `Canada `_ +* `Chicago `_ +* `EuroStat `_ +* `FedStats `_ +* `Germany `_ +* `Glasgow, Scotland, UK `_ +* `Guardian world governments `_ +* `London Datastore, U.K `_ +* `Netherlands `_ +* `New Zealand `_ +* `NYC betanyc `_ +* `NYC Open Data `_ +* `OECD `_ +* `Open Government Data (OGD) Platform India `_ +* `RITA `_ +* `San Francisco Data sets `_ +* `South Africa `_ +* `The World Bank `_ +* `U.K. Government Data `_ +* `U.S. American Community Survey `_ +* `U.S. CDC Public Health datasets `_ +* `U.S. Census Bureau `_ +* `U.S. Department of Housing and Urban Development (HUD) `_ +* `U.S. Federal Government Agencies `_ +* `U.S. Federal Government Data Catalog `_ +* `U.S. Food and Drug Administration (FDA) `_ +* `U.S. Open Government `_ +* `UK 2011 Census Open Atlas Project `_ +* `United Nations `_ Healthcare ---------- -* EHDP Large Health Data Sets: http://www.ehdp.com/vitalnet/datasets.htm -* Gapminder: http://www.gapminder.org/data/ -* Medicare Data File: http://go.cms.gov/19xxPN4 +* `EHDP Large Health Data Sets `_ +* `Gapminder `_ +* `Medicare Data File `_ Image Processing ---------------- -* 2GB of photos of cats: http://137.189.35.203/WebUI/CatDatabase/catData.html -* Face Recognition Benchmark: http://www.face-rec.org/databases/ -* ImageNet: http://www.image-net.org/ +* `2GB of photos of cats `_ +* `Face Recognition Benchmark `_ +* `ImageNet `_ Machine Learning ---------------- -* eBay Online Auctions: http://www.modelingonlineauctions.com/datasets -* IMDb database: http://www.imdb.com/interfaces -* Keel Repository: http://sci2s.ugr.es/keel/datasets.php -* Lending Club Loan Data: https://www.lendingclub.com/info/download-data.action -* Machine Learning Data Set Repository: http://mldata.org/ -* Million Song Dataset: http://blog.echonest.com/post/3639160982/million-song-dataset -* More Song Datasets: http://labrosa.ee.columbia.edu/millionsong/pages/additional-datasets -* MovieLens Data Sets: http://datahub.io/dataset/movielens -* RDataMining R and Data Mining ebook data: http://www.rdatamining.com/data -* Registered meteorites on Earth: http://www.analyticbridge.com/profiles/blogs/registered-meteorites-that-has-impacted-on-earth-visualized -* SF restaurants dataset: http://missionlocal.org/san-francisco-restaurant-health-inspections/ -* UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/ -* University of Toronto Delve Datasets: http://www.cs.toronto.edu/~delve/data/datasets.html -* Yahoo Ratings and Classification Data: http://webscope.sandbox.yahoo.com/catalog.php?datatype=r +* `eBay Online Auctions `_ +* `IMDb database `_ +* `Keel Repository `_ +* `Lending Club Loan Data `_ +* `Machine Learning Data Set Repository `_ +* `Million Song Dataset `_ +* `More Song Datasets `_ +* `MovieLens Data Sets `_ +* `RDataMining R and Data Mining ebook data `_ +* `Registered meteorites on Earth `_ +* `SF restaurants dataset `_ +* `UCI Machine Learning Repository `_ +* `University of Toronto Delve Datasets `_ +* `Yahoo Ratings and Classification Data `_ Museums ------- -* Cooper-Hewitt's Collection Database: https://github.com/cooperhewitt/collection -* Minneapolis Institute of Arts metadata: https://github.com/artsmia/collection -* Tate Collection metadata: https://github.com/tategallery/collection -* The Getty vocabularies: http://vocab.getty.edu +* `Cooper-Hewitt's Collection Database `_ +* `Minneapolis Institute of Arts metadata `_ +* `Tate Collection metadata `_ +* `The Getty vocabularies `_ Music ----- -* Discogs Data: http://www.discogs.com/data/ + +* `Discogs Data `_ Natural Language ---------------- -* 40 Million Entities in Context: https://code.google.com/p/wiki-links/downloads/list -* ClueWeb09 FACC: http://lemurproject.org/clueweb09/FACC1/ -* ClueWeb12 FACC: http://lemurproject.org/clueweb12/FACC1/ -* DBpedia: http://wiki.dbpedia.org/Datasets -* Flickr personal taxonomies: http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html -* Google Books Ngrams: http://aws.amazon.com/datasets/8172056142375670 -* Google Web 5gram, 2006 (1T): https://catalog.ldc.upenn.edu/LDC2006T13 -* Gutenberg eBooks List: http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs -* Hansards: http://www.isi.edu/natural-language/download/hansard/ -* Machine Translation: http://statmt.org/wmt11/translation-task.html#download -* SMS Spam Collection: http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/ -* USENET corpus: http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html -* Wikidata: https://www.wikidata.org/wiki/Wikidata:Database_download -* WordNet: http://wordnet.princeton.edu/wordnet/download/ +* `40 Million Entities in Context `_ +* `ClueWeb09 FACC `_ +* `ClueWeb12 FACC `_ +* `DBpedia `_ +* `Flickr personal taxonomies `_ +* `Google Books Ngrams `_ +* `Google Web 5gram, 2006 (1T) `_ +* `Gutenberg eBooks List `_ +* `Hansards `_ +* `Machine Translation `_ +* `SMS Spam Collection `_ +* `USENET corpus `_ +* `Wikidata `_ +* `WordNet `_ Physics ------- -* CERN Open Data Portal: http://opendata.cern.ch/ -* NASA: http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html +* `CERN Open Data Portal `_ +* `NASA `_ Public Domains -------------- -* Amazon: http://aws.amazon.com/datasets -* Archive.org Datasets: https://archive.org/details/datasets -* CMU JASA data archive: http://lib.stat.cmu.edu/jasadata/ -* CMU StatLab collections: http://lib.stat.cmu.edu/datasets/ -* Data360: http://www.data360.org/index.aspx -* Datamob.org: http://datamob.org/datasets -* Google: http://www.google.com/publicdata/directory -* infochimps: http://www.infochimps.com/ -* KDNuggets Data Collections: http://www.kdnuggets.com/datasets/index.html -* Numbray: http://numbrary.com/ -* RevolutionAnalytics Collection: http://www.revolutionanalytics.com/subscriptions/datasets/ -* Sample R data sets: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/00Index.html -* Stats4Stem R data sets: http://www.stats4stem.org/data-sets.html -* StatSci.org: http://www.statsci.org/datasets.html -* The Washington Post List: http://www.washingtonpost.com/wp-srv/metro/data/datapost.html -* UCLA SOCR data collection: http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data -* UFO Reports: http://www.nuforc.org/webreports.html -* Wikileaks 911 pager intercepts: http://911.wikileaks.org/files/index.html -* Yahoo Webscope: http://webscope.sandbox.yahoo.com/catalog.php +* `Amazon `_ +* `Archive.org Datasets `_ +* `CMU JASA data archive `_ +* `CMU StatLab collections `_ +* `Data360 `_ +* `Datamob.org `_ +* `Google `_ +* `Infochimps `_ +* `KDNuggets Data Collections `_ +* `Numbray `_ +* `Reddit Datasets `_ +* `RevolutionAnalytics Collection `_ +* `Sample R data sets `_ +* `Stats4Stem R data sets `_ +* `StatSci.org `_ +* `The Washington Post List `_ +* `UCLA SOCR data collection `_ +* `UFO Reports `_ +* `Wikileaks 911 pager intercepts `_ +* `Yahoo Webscope `_ Search Engines -------------- -* Academic Torrents: http://academictorrents.com/ -* Datahub.io: http://datahub.io/dataset -* DataMarket: https://datamarket.com/data/list/?q=all -* Freebase: http://www.freebase.com/ -* Harvard Dataverse: http://thedata.harvard.edu/dvn/ -* Statista: http://www.statista.com/ +* `Academic Torrents `_ +* `Archive-it `_ +* `Datahub.io `_ +* `DataMarket.com `_ +* `Freebase.com `_ +* `Harvard Dataverse `_ +* `Statista.com `_ Social Sciences --------------- -* CMU Enron Email: http://www.cs.cmu.edu/~enron/ -* Facebook Social Networks (since 2007): http://law.di.unimi.it/datasets.php -* Facebook100 (2005): https://archive.org/details/oxford-2005-facebook-matrix -* Foursquare (2010,2011): http://www.public.asu.edu/~hgao16/dataset.html -* Foursquare (UMN/Sarwat, 2013): https://archive.org/details/201309_foursquare_dataset_umn -* General Social Survey (GSS): http://www3.norc.org/GSS+Website/ -* GetGlue (users rating TV shows): http://bit.ly/1aL8XS0 -* GitHub Archive: http://www.githubarchive.org/ -* ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/index.jsp -* Mobile Social Networks (UMASS): https://kdl.cs.umass.edu/display/public/Mobile+Social+Networks -* PewResearch Internet Project: http://www.pewinternet.org/datasets/pages/2/ -* Social Networking: http://www.cs.cmu.edu/~jelsas/data/ancestry.com/ -* SourceForge Graph: http://www.nd.edu/~oss/Data/data.html -* Stack Exchange Network (Data Explorer): http://data.stackexchange.com/help -* Titanic Survival Data Set: http://bit.do/dataset-titanic-csv-zip -* Twitter Graph: http://an.kaist.ac.kr/traces/WWW2010.html -* UC Berkeley's D-Lab Achive: http://ucdata.berkeley.edu/ -* UCLA Social Sciences Data Archive: http://dataarchives.ss.ucla.edu/Home.DataPortals.htm -* UNIMI Social Network Datasets: http://law.di.unimi.it/datasets.php -* Universities Worldwide: http://univ.cc/ -* UPJOHN for Employment Research: http://www.upjohn.org/erdc/erdc.html -* Yahoo Graph and Social Data: http://webscope.sandbox.yahoo.com/catalog.php?datatype=g -* Youtube Graph (2007,2008): http://netsg.cs.sfu.ca/youtubedata/ +* `CMU Enron Email `_ +* `Facebook Social Networks (since 2007) `_ +* `Facebook100 (2005) `_ +* `Foursquare (2010,2011) `_ +* `Foursquare (UMN/Sarwat, 2013) `_ +* `General Social Survey (GSS) `_ +* `GetGlue (users rating TV shows) `_ +* `GitHub Archive `_ +* `ICPSR `_ +* `Mobile Social Networks (UMASS) `_ +* `PewResearch Internet Project `_ +* `Social Networking `_ +* `SourceForge Graph `_ +* `Stack Exchange Network (Data Explorer) `_ +* `Titanic Survival Data Set `_ +* `Twitter Graph `_ +* `UC Berkeley's D-Lab Achive `_ +* `UCLA Social Sciences Data Archive `_ +* `UNIMI Social Network Datasets `_ +* `Universities Worldwide `_ +* `UPJOHN for Employment Research `_ +* `Yahoo Graph and Social Data `_ +* `Youtube Graph (2007,2008) `_ Sports ------ -* Betfair (betting exchange) Event Results: http://data.betfair.com/ -* Cricsheet (cricket): http://cricsheet.org/ -* Ergast Formula 1 (API available): http://ergast.com/mrd/db -* Football/Soccer data and APIs: http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/ -* Lahman's Baseball Database: http://www.seanlahman.com/baseball-archive/statistics/ -* Retrosheet (baseball): http://www.retrosheet.org/game.htm +* `Betfair (betting exchange) Event Results `_ +* `Cricsheet (cricket) `_ +* `Ergast Formula 1 (API available) `_ +* `Football/Soccer data and APIs `_ +* `Lahman's Baseball Database `_ +* `Retrosheet (baseball) `_ Time Series ----------- -* Time Series data Library: https://datamarket.com/data/list/?q=provider:tsdl -* UC Riverside Time Series: http://www.cs.ucr.edu/~eamonn/time_series_data/ +* `Time Series data Library `_ +* `UC Riverside Time Series `_ Transportation -------------- -* Airlines Data (2009 ASA Challenge): http://stat-computing.org/dataexpo/2009/the-data.html -* Bike Share Data Systems: https://github.com/BetaNYC/Bike-Share-Data-Best-Practices/wiki/Bike-Share-Data-Systems -* Edge data for US domestic flights 1990 to 2009: http://data.memect.com/?p=229 -* Half a million Hubway rides: http://hubwaydatachallenge.org/trip-history-data/ -* Marine Traffic - ship tracks, port calls and more: https://www.marinetraffic.com/de/p/api-services -* NYC Taxi Trip Data 2013 (FOIA/FOIL): https://archive.org/details/nycTaxiTripData2013 -* OpenFlights (airport, airline and route data): http://openflights.org/data.html -* RITA Airline On-Time Performance Data: http://www.transtats.bts.gov/Tables.asp?DB_ID=120 -* RITA transport data collection: http://www.transtats.bts.gov/DataIndex.asp -* Transport for London: http://www.tfl.gov.uk/info-for/open-data-users/our-feeds -* U.S. Freight Analysis Framework: http://ops.fhwa.dot.gov/freight/freight_analysis/faf/index.htm +* `Airlines Data (2009 ASA Challenge) `_ +* `Bike Share Data Systems `_ +* `Edge data for US domestic flights 1990 to 2009 `_ +* `Half a million Hubway rides `_ +* `Marine Traffic - ship tracks, port calls and more `_ +* `NYC Taxi Trip Data 2013 (FOIA/FOIL) `_ +* `OpenFlights (airport, airline and route data) `_ +* `RITA Airline On-Time Performance Data `_ +* `RITA transport data collection `_ +* `Transport for London `_ +* `U.S. Freight Analysis Framework `_ Complementary Collections ------------------------- -* DataWrangling: http://www.datawrangling.com/some-datasets-available-on-the-web -* Inside-r: http://www.inside-r.org/howto/finding-data-internet -* Quora: http://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public -* Reddit: http://www.reddit.com/r/datasets -* RS Collection 100+ : http://rs.io/2014/05/29/list-of-data-sets.html -* StaTrek: http://hsiamin.com/posts/2014/10/23/leveraging-open-data-to-understand-urban-lives/ +* DataWrangling: `Some Datasets Available on the Web `_ +* Inside-r: `Finding Data on the Internet `_ +* Quora: `Where can I find large datasets open to the public? `_ +* RS.io: `100+ Interesting Data Sets for Statistics `_ +* StaTrek: `Leveraging open data to understand urban lives `_