Adds information about how to run with 1 Elastic Search and multiple machines.

2019-11-08 22:16:38 +01:00 · 2019-11-08 22:16:38 +01:00 · c83a31b907
parent 12865d9f75
commit c83a31b907
2 changed files with 20 additions and 3 deletions
--- a/README.md
+++ b/README.md
@ -48,7 +48,8 @@ to send to sleep at 23:00 and restart processing at 09:00 everyday

 Having Docker and Docker-compose installed, run first:
 ```
-docker-compose up -d 
+docker-compose up -d
+pip install -r requirements.txt
 ```

 Then configure ElasticSearch index:
@ -95,4 +96,20 @@ Taking into account that there are restrictions that prevents a crapping faster
 scrapping can take very long time. so:
 1) Go directly to the provinces / cities you need the most. Leave the rest for later.
 2) Use different IP addresses and query parallely.
-3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
+3) Write me an email to jjmcarrascosa@gmail.com to get the full DB.
+
+**Using additional machines (parallel extraction)**
+You won't need to repeat the previous steps, because we will use one Elastic Search for all the machines.
+For additional machines, do the following:
+1) Make sure you have successfully run all the previous steps and ElasticSearch is running in one machine;
+2) Copy the pubic IP address of that machine
+3) In a new machine, clone this repository and do the following:
+```
+pip install -r requirements.txt
+export ES_HOST="{IP OR HOST OF THE MACHINE RUNNING ELASTICSEARCH"
+export ES_PORT="{PORT OF THE MACHINE RUNNING ELASTICSEARCH. USUALLY 9200}"
+```
+And finally, run libreCatastro:
+```
+python libreCatastro.py [....]
+```
--- a/libreCatastro.py
+++ b/libreCatastro.py
@ -57,7 +57,7 @@ if __name__ == "__main__":

    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(config)
-    
+
    ''' Scrapping / Parsing core functionality'''
    parser = ParserHTML if args.html else ParserXML