From 3ac95867c623390a4fdafdc11c0b0fd3cd7abc54 Mon Sep 17 00:00:00 2001 From: Alexandre Pinto Date: Tue, 6 Jan 2015 00:28:31 +0000 Subject: [PATCH 01/16] New image processing data sets --- README.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/README.rst b/README.rst index d4d2ce8..120f212 100644 --- a/README.rst +++ b/README.rst @@ -195,6 +195,11 @@ Image Processing * `2GB of photos of cats `_ * `Face Recognition Benchmark `_ * `ImageNet `_ +* `SUN database `_ +* `10k US Adult Faces Database `_ +* `Affective Image Classification `_ +* `International Affective Picture System `_ +* `Massive Visual Memory Stimuli `_ Machine Learning From 2f651a452a3f617a9a9cff4ee8f8dfd4c4fbf35a Mon Sep 17 00:00:00 2001 From: EngineerEmily Date: Fri, 23 Jun 2017 21:35:34 -0700 Subject: [PATCH 02/16] Adding local data portals --- Government.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Government.rst b/Government.rst index 1df8d04..7c7758a 100644 --- a/Government.rst +++ b/Government.rst @@ -51,20 +51,24 @@ Government * `London, ON, Canada `_ * `Los Angeles Open Data `_ * `MassGIS, Massachusetts, U.S. `_ +* `Metropolitain Transportation Commission (MTC), California, US `_ * `Mexico `_ * `Missisauga, ON, Canada `_ * `Moldova `_ * `Moncton, NB, Canada `_ +* `Mountain View, California, US (GIS) `_ * `Montreal, QC, Canada `_ * `Netherlands `_ * `New Zealand `_ * `NYC betanyc `_ * `NYC Open Data `_ +* `Oakland, California, US `_ * `OECD `_ * `Oklahoma `_ * `Open Government Data (OGD) Platform India `_ * `Oregon `_ * `Ottawa, ON, Canada `_ +* `Palo Alto, California, US `_ * `Portland, Oregon `_ * `Portugal - Pordata organization `_ * `Puerto Rico Government `_ @@ -75,6 +79,8 @@ Government * `Romania `_ * `Russia `_ * `San Francisco Data sets `_ +* `San Jose, California, US `_ +* `San Mateo County, California, US `_ * `Saskatchewan, Province of Canada `_ * `Seattle `_ * `Singapore Government Data `_ @@ -102,6 +108,7 @@ Government * `UK 2011 Census Open Atlas Project `_ * `United Nations `_ * `Uruguay `_ +* `Valley Transportation Authority (VTA), California, US `_ * `Vancouver, BC Open Data Catalog `_ * `Victoria, BC, Canada `_ * `Vienna, Austria `_ From 0bde4fd8edcf044131d5669fd22a1ac10f1b2ee3 Mon Sep 17 00:00:00 2001 From: Ryan Barrett Date: Thu, 29 Jun 2017 07:36:48 -0700 Subject: [PATCH 03/16] Add Indie Map --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index edab464..b169fcb 100755 --- a/README.rst +++ b/README.rst @@ -472,6 +472,7 @@ Social Networks * `GitHub Collaboration Archive `_ * `Google Scholar citation relations `_ * `High-Resolution Contact Networks from Wearable Sensors `_ +* `Indie Map: social graph and crawl of top IndieWeb sites `_ * `Mobile Social Networks from UMASS `_ * `Network Twitter Data `_ * `Reddit Comments `_ From 1c57e245bd11f2f6d650ad07a4c3b4d92bc6d087 Mon Sep 17 00:00:00 2001 From: Tom Morris Date: Tue, 11 Jul 2017 10:37:39 -0400 Subject: [PATCH 04/16] Datamob is gone --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index edab464..1a33385 100755 --- a/README.rst +++ b/README.rst @@ -422,7 +422,6 @@ Public Domains * `CMU StatLab collections `_ * `Data.World `_ * `Data360 `_ -* `Datamob.org `_ * `Google `_ * `Infochimps `_ * `KDNuggets Data Collections `_ From 76ee6a0012c8d5d835581928e15b3f8416b71383 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:54:22 +0800 Subject: [PATCH 05/16] Fix #308 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 1a33385..f631ee5 100755 --- a/README.rst +++ b/README.rst @@ -269,6 +269,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ +* `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ @@ -276,7 +277,7 @@ Healthcare * `Number of Ebola Cases and Deaths in Affected Countries (2014) `_ * `Open-ODS (structure of the UK NHS) `_ * `OpenPaymentsData, Healthcare financial relationship data `_ -* `The Cancer Genome Atlas project (TCGA) `_ and `BigQuery table `_ +* The Cancer Genome Atlas project (TCGA) (refer to `GDC `_ and `BigQuery table `_) * `World Health Organization Global Health Observatory `_ From a12a3b41693047128bda88552ad1543950c4bb32 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 10:55:40 +0800 Subject: [PATCH 06/16] Fix #307 --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f631ee5..8155e1e 100755 --- a/README.rst +++ b/README.rst @@ -349,7 +349,7 @@ Museums Natural Language ---------------- -* `Automatic Keyphrase Extracttion `_ +* `Automatic Keyphrase Extraction `_ * `Blogger Corpus `_ * `CLiPS Stylometry Investigation Corpus `_ * `ClueWeb09 FACC `_ From 853dbff93781b301cc4af8249927c505192d1d41 Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Thu, 10 Aug 2017 11:06:01 +0800 Subject: [PATCH 07/16] #306 --- README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 8155e1e..9472dc3 100755 --- a/README.rst +++ b/README.rst @@ -4,7 +4,7 @@ Awesome Public Datasets :alt: Awesome :target: https://github.com/sindresorhus/awesome -`This list of public data sources `_ +`This list of a topic-centric public data sources `_ in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed below are free, however, some are not. Other amazingly awesome lists can be found in the @@ -270,6 +270,7 @@ Healthcare * `EHDP Large Health Data Sets `_ * `Gapminder World demographic databases `_ * `GDC supports several cancer genome programs for CCG, TCGA, TARGET etc. `_ +* `PhysioBank Databases - a large and growing archive of physiological data `_ * `Medicare Coverage Database (MCD), U.S. `_ * `Medicare Data Engine of medicare.gov Data `_ * `Medicare Data File `_ From 15d70df85e958cec172ddd7c39ef5183b9fa2b38 Mon Sep 17 00:00:00 2001 From: Fabio D'Elia Date: Mon, 21 Aug 2017 10:59:02 +0200 Subject: [PATCH 08/16] changed Registered Meteorites on Earth to new link --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f6b6bde..5ea3cc0 100755 --- a/README.rst +++ b/README.rst @@ -328,7 +328,7 @@ Machine Learning * `MovieLens Data Sets `_ * `New Yorker caption contest ratings `_ * `RDataMining - "R and Data Mining" ebook data `_ -* `Registered Meteorites on Earth `_ +* `Registered Meteorites on Earth `_ * `Restaurants Health Score Data in San Francisco `_ * `UCI Machine Learning Repository `_ * `Yahoo! Ratings and Classification Data `_ From 39dab15b605b1c93a77a185ab019e6348264b39f Mon Sep 17 00:00:00 2001 From: Muhammad Faheem Akhtar Date: Sat, 26 Aug 2017 17:34:12 +0500 Subject: [PATCH 09/16] Fixed a broken link The link to "Caltech Pedestrian Detection Benchmark" was broken - issue 315 by sentientmachine --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index f6b6bde..ef7fc93 100755 --- a/README.rst +++ b/README.rst @@ -290,7 +290,7 @@ Image Processing * `Adience Unfiltered faces for gender and age classification `_ * `Affective Image Classification `_ * `Animals with attributes `_ -* `Caltech Pedestrian Detection Benchmark `_ +* `Caltech Pedestrian Detection Benchmark `_ * `Chars74K dataset, Character Recognition in Natural Images (both English and Kannada are available) `_ * `Face Recognition Benchmark `_ * `Flickr: 32 Class Brand Logos `_ From 0822a7840965d68e4ed773fd02fe2768f7c8c3ac Mon Sep 17 00:00:00 2001 From: Leonardo Taccari Date: Thu, 31 Aug 2017 11:35:11 +0200 Subject: [PATCH 10/16] Broken link The link is broken. The pages http://www.draftexpress.com/stats/nba,http://www.draftexpress.com/stats/ncaa, http://www.draftexpress.com/stats/euroleague exist, but it looks like there's no downloadable dataset. --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index f6b6bde..8054e7c 100755 --- a/README.rst +++ b/README.rst @@ -543,7 +543,6 @@ Software Sports ------ -* `Basketball (NBA/NCAA/Euro) Player Database and Statistics `_ * `Betfair Historical Exchange Data `_ * `Cricsheet Matches (cricket) `_ * `Ergast Formula 1, from 1950 up to date (API) `_ From 713e56ad6c83e73c0716a85c907af82391043adc Mon Sep 17 00:00:00 2001 From: Keith Stolte Date: Mon, 16 Oct 2017 21:24:22 -0400 Subject: [PATCH 11/16] Update of a few US Gov Links Looks like some of the pages may have been moved around since this was started. Updated a few. --- Government.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Government.rst b/Government.rst index 1df8d04..7b7f26d 100644 --- a/Government.rst +++ b/Government.rst @@ -89,8 +89,8 @@ Government * `Toronto, ON, Canada `_ * `Tunisia `_ * `U.K. Government Data `_ -* `U.S. American Community Survey `_ -* `U.S. CDC Public Health datasets `_ +* `U.S. American Community Survey `_ +* `U.S. CDC Public Health datasets `_ * `U.S. Census Bureau `_ * `U.S. Department of Housing and Urban Development (HUD) `_ * `U.S. Federal Government Agencies `_ From 1de47f3ed06b1362b9d8f9e38c168ad09468540c Mon Sep 17 00:00:00 2001 From: Kostas Christidis Date: Tue, 31 Oct 2017 19:23:37 -0400 Subject: [PATCH 12/16] Fix Dataport URL Closes #331. Signed-off-by: Kostas Christidis --- README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.rst b/README.rst index 0c556f0..7f28f55 100755 --- a/README.rst +++ b/README.rst @@ -199,7 +199,7 @@ Energy * `AMPds `_ * `BLUEd `_ * `COMBED `_ -* `Dataport `_ +* `Dataport `_ * `DRED `_ * `ECO `_ * `EIA `_ From f6381e21f3457b2f9035363efe6af2087ff250d6 Mon Sep 17 00:00:00 2001 From: Kostas Christidis Date: Fri, 3 Nov 2017 05:37:40 -0400 Subject: [PATCH 13/16] Remove Dataport URL Dataport no longer offers public datasets. Closes #331. Signed-off-by: Kostas Christidis --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index 7f28f55..60b10b0 100755 --- a/README.rst +++ b/README.rst @@ -199,7 +199,6 @@ Energy * `AMPds `_ * `BLUEd `_ * `COMBED `_ -* `Dataport `_ * `DRED `_ * `ECO `_ * `EIA `_ From 1c1bd03b4d4de1a93d34f0b923a2962288f38e31 Mon Sep 17 00:00:00 2001 From: Tom Morris Date: Fri, 10 Nov 2017 17:29:24 -0500 Subject: [PATCH 14/16] Remove commercial marinetraffic.com - fixes #333 --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index 0c556f0..740d59a 100755 --- a/README.rst +++ b/README.rst @@ -575,7 +575,6 @@ Transportation * `GeoLife GPS Trajectory from Microsoft Research `_ * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ -* `Marine Traffic - ship tracks, port calls and more `_ * `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From 7e881ea669743f4095b24151a5800e271f834c9d Mon Sep 17 00:00:00 2001 From: Xiaming Chen Date: Sun, 26 Nov 2017 19:13:09 +0800 Subject: [PATCH 15/16] Fix #333. Remove Marine Traffic It turns non-open any more --- README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/README.rst b/README.rst index 60b10b0..0296190 100755 --- a/README.rst +++ b/README.rst @@ -574,7 +574,6 @@ Transportation * `GeoLife GPS Trajectory from Microsoft Research `_ * `German train system by Deutsche Bahn `_ * `Hubway Million Rides in MA `_ -* `Marine Traffic - ship tracks, port calls and more `_ * `Montreal BIXI Bike Share `_ * `NYC Taxi Trip Data 2009- `_ * `NYC Taxi Trip Data 2013 (FOIA/FOILed) `_ From 23b406d5370b3032df09a4e9b5869be0688bc3b9 Mon Sep 17 00:00:00 2001 From: Min Date: Mon, 18 Dec 2017 14:13:25 +1300 Subject: [PATCH 16/16] Added Stanford Question Answering Dataset (SQuAD) In right alphabetical order. --- README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/README.rst b/README.rst index 0296190..c34ccea 100755 --- a/README.rst +++ b/README.rst @@ -373,6 +373,7 @@ Natural Language * `Personae Corpus `_ * `SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) `_ * `SMS Spam Collection in English `_ +* `Stanford Question Answering Dataset (SQuAD) `_ * `Universal Dependencies `_ * `USENET postings corpus of 2005~2011 `_ * `Webhose - News/Blogs in multiple languages `_