Adds information about how to run with 1 Elastic Search and multiple machines.

This commit is contained in:
josejuanmartinez 2019-11-08 22:16:38 +01:00
parent 12865d9f75
commit c83a31b907
2 changed files with 20 additions and 3 deletions

View File

@ -48,7 +48,8 @@ to send to sleep at 23:00 and restart processing at 09:00 everyday
Having Docker and Docker-compose installed, run first:
```
docker-compose up -d
docker-compose up -d
pip install -r requirements.txt
```
Then configure ElasticSearch index:
@ -95,4 +96,20 @@ Taking into account that there are restrictions that prevents a crapping faster
scrapping can take very long time. so:
1) Go directly to the provinces / cities you need the most. Leave the rest for later.
2) Use different IP addresses and query parallely.
3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
**Using additional machines (parallel extraction)**
You won't need to repeat the previous steps, because we will use one Elastic Search for all the machines.
For additional machines, do the following:
1) Make sure you have successfully run all the previous steps and ElasticSearch is running in one machine;
2) Copy the pubic IP address of that machine
3) In a new machine, clone this repository and do the following:
```
pip install -r requirements.txt
export ES_HOST="{IP OR HOST OF THE MACHINE RUNNING ELASTICSEARCH"
export ES_PORT="{PORT OF THE MACHINE RUNNING ELASTICSEARCH. USUALLY 9200}"
```
And finally, run libreCatastro:
```
python libreCatastro.py [....]
```

View File

@ -57,7 +57,7 @@ if __name__ == "__main__":
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(config)
''' Scrapping / Parsing core functionality'''
parser = ParserHTML if args.html else ParserXML