2019-09-25 18:25:04 +02:00
|
|
|
# libreCATASTRO
|
2019-09-20 00:53:33 +02:00
|
|
|
An opensource, MIT-licensed application that scraps the official Spanish
|
2019-09-21 15:11:32 +02:00
|
|
|
Cadaster registry and stores information in Elastic Searcher.
|
2019-09-20 00:53:33 +02:00
|
|
|
|
2019-09-28 11:33:14 +02:00
|
|
|
![libreCatastro example](https://drive.google.com/uc?export=view&id=1kisisDNmrQ5ZBWNzqnSzF0AsHu6-zS-P "libreCadsatro example")
|
2019-09-25 18:32:39 +02:00
|
|
|
|
2019-09-20 00:53:33 +02:00
|
|
|
**Features**
|
|
|
|
|
|
|
|
_Scrapping_
|
|
|
|
* From XML webservices. Check http://www.catastro.meh.es/ws/Webservices_Libres.pdf
|
2019-09-25 18:23:37 +02:00
|
|
|
* From HTML webpages.
|
|
|
|
* Scraps all properties, including houses, flats, garages, storehouses, even buildings in ruins!
|
|
|
|
* Scraps all usages and purposes: living, commercial, religious, military...
|
|
|
|
* Scraps rural (parcelas) and urban properties.
|
|
|
|
* Retrieves **the building plan** of every property
|
|
|
|
* Skips already scrapped information
|
|
|
|
* Can be queried to scrap a list of provinces
|
|
|
|
* Can be queried to scrap by a polygon of coordinates
|
|
|
|
* Can be queried to start from a specific city in a province
|
2019-09-20 00:53:33 +02:00
|
|
|
|
|
|
|
_Storing_
|
|
|
|
* Stores in ElasticSearch
|
2019-09-25 18:23:37 +02:00
|
|
|
* Supports automatic map visualization in Kibana
|
2019-09-20 00:53:33 +02:00
|
|
|
|
|
|
|
_Visualization_
|
|
|
|
|
|
|
|
Includes a configured Kibana that shows.
|
|
|
|
1) A heatmap in the map of Spain (World) where the properties are
|
|
|
|
2) All data in tables
|
|
|
|
3) The picture of the property
|
|
|
|
|
|
|
|
**DoS Warning**
|
|
|
|
|
|
|
|
Spanish Cadaster has set restrictions, banning temporarily IPs that more than 10
|
|
|
|
queries in 5 seconds. A sleep command has been set to 5sec where needed, and can be configured
|
|
|
|
at your own risk.
|
|
|
|
|
2019-09-28 11:33:14 +02:00
|
|
|
At night DoS happens more often it seems, and 5sec can throw a `Connection Reset by Peer` message.
|
|
|
|
To try to avoid this, add this two cron commands after having launched libreCatastro
|
|
|
|
to send to sleep at 23:00 and restart processing at 09:00 everyday
|
|
|
|
```
|
|
|
|
0 23 * * * ps aux | grep "[l]ibreCadastro" | awk '{print $2}' | xargs kill -TSTP
|
|
|
|
0 09 * * * ps aux | grep "[l]ibreCadastro" | awk '{print $2}' | xargs kill -CONT
|
2019-09-28 11:33:56 +02:00
|
|
|
```
|
2019-09-25 18:23:37 +02:00
|
|
|
|
|
|
|
|
2019-09-20 00:53:33 +02:00
|
|
|
**Installation**
|
|
|
|
|
|
|
|
Having Docker and Docker-compose installed, run first:
|
|
|
|
```
|
|
|
|
docker-compose up -d
|
|
|
|
```
|
|
|
|
|
|
|
|
Then configure ElasticSearch index:
|
|
|
|
```
|
|
|
|
python3 initialize_elasticsearch.py
|
|
|
|
```
|
|
|
|
|
2019-09-28 11:33:14 +02:00
|
|
|
An finally, execute libreCatastro as follows in the next step.
|
2019-09-20 00:53:33 +02:00
|
|
|
|
|
|
|
**Execution**
|
|
|
|
```
|
2019-09-28 11:33:14 +02:00
|
|
|
$ python libreCatastro.py --help
|
2019-09-25 18:23:37 +02:00
|
|
|
|
2019-09-28 11:33:14 +02:00
|
|
|
usage: libreCatastro.py [-h] [--coords]
|
2019-09-25 18:23:37 +02:00
|
|
|
[--filenames FILENAMES [FILENAMES ...]]
|
|
|
|
[--provinces PROVINCES [PROVINCES ...]]
|
|
|
|
[--sleep SLEEP] [--html] [--scale SCALE] [--pictures]
|
|
|
|
[--startcity STARTCITY] [--listprovinces]
|
|
|
|
[--listcities LISTCITIES] [--health]
|
|
|
|
|
2019-09-28 11:33:14 +02:00
|
|
|
Runs libreCatastro
|
2019-09-25 18:23:37 +02:00
|
|
|
|
|
|
|
optional arguments:
|
|
|
|
-h, --help show this help message and exit
|
|
|
|
--coords (scrapping by coordinates. By default, if not set, it's by provinces)
|
|
|
|
--filenames FILENAMES [FILENAMES ...] (for files with polygon coordinates)
|
|
|
|
--provinces PROVINCES [PROVINCES ...] (for a list of provinces to scrap)
|
|
|
|
--sleep SLEEP (time to sleep to avoid Cadaster DoS)
|
|
|
|
--html (if you prefer to scrap HTML or if XML servers are down)
|
|
|
|
--scale SCALE (for scrapping by coordinates, how big is the step)
|
|
|
|
--pictures (scrap also the plan of the house)
|
|
|
|
--startcity STARTCITY (start from a specific city in a province, in alphabetic order)
|
|
|
|
--listprovinces (just list all provinces in alphabetic order)
|
|
|
|
--listcities PROVINCE (just list all cities of a province in alphabetic order)
|
|
|
|
--health (check if Cadaster servers are up)
|
|
|
|
```
|
|
|
|
|
|
|
|
**Health**
|
|
|
|
I highly recommend to execute first of all:
|
2019-09-28 11:33:14 +02:00
|
|
|
`python3 libreCatastro.py --health` to check if XML and HTML servers are up.
|
2019-09-20 00:53:33 +02:00
|
|
|
|
2019-09-25 18:23:37 +02:00
|
|
|
**Time to get the complete DB**
|
|
|
|
Taking into account that there are restrictions that prevents a crapping faster than 5sec per page,
|
|
|
|
scrapping can take very long time. so:
|
|
|
|
1) Go directly to the provinces / cities you need the most. Leave the rest for later.
|
|
|
|
2) Use different IP addresses and query parallely.
|
2019-09-28 11:33:14 +02:00
|
|
|
3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
|