Adds information about how to run with 1 Elastic Search and multiple machines.

This commit is contained in:
josejuanmartinez 2019-11-08 22:16:38 +01:00
parent 12865d9f75
commit c83a31b907
2 changed files with 20 additions and 3 deletions

View File

@ -48,7 +48,8 @@ to send to sleep at 23:00 and restart processing at 09:00 everyday
Having Docker and Docker-compose installed, run first: Having Docker and Docker-compose installed, run first:
``` ```
docker-compose up -d docker-compose up -d
pip install -r requirements.txt
``` ```
Then configure ElasticSearch index: Then configure ElasticSearch index:
@ -95,4 +96,20 @@ Taking into account that there are restrictions that prevents a crapping faster
scrapping can take very long time. so: scrapping can take very long time. so:
1) Go directly to the provinces / cities you need the most. Leave the rest for later. 1) Go directly to the provinces / cities you need the most. Leave the rest for later.
2) Use different IP addresses and query parallely. 2) Use different IP addresses and query parallely.
3) Write me an email to jjmcarrascosa@gmail.com to get the full DB. 3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
**Using additional machines (parallel extraction)**
You won't need to repeat the previous steps, because we will use one Elastic Search for all the machines.
For additional machines, do the following:
1) Make sure you have successfully run all the previous steps and ElasticSearch is running in one machine;
2) Copy the pubic IP address of that machine
3) In a new machine, clone this repository and do the following:
```
pip install -r requirements.txt
export ES_HOST="{IP OR HOST OF THE MACHINE RUNNING ELASTICSEARCH"
export ES_PORT="{PORT OF THE MACHINE RUNNING ELASTICSEARCH. USUALLY 9200}"
```
And finally, run libreCatastro:
```
python libreCatastro.py [....]
```

View File

@ -57,7 +57,7 @@ if __name__ == "__main__":
pp = pprint.PrettyPrinter(indent=4) pp = pprint.PrettyPrinter(indent=4)
pp.pprint(config) pp.pprint(config)
''' Scrapping / Parsing core functionality''' ''' Scrapping / Parsing core functionality'''
parser = ParserHTML if args.html else ParserXML parser = ParserHTML if args.html else ParserXML