#%env/templates/metas.template%# #%env/templates/header.template%# #%env/templates/submenuIndexCreate.template%#

Indexing with Proxy

YaCy can be used to 'scrape' content from pages that pass the integrated caching HTTP proxy. When scraping proxy pages then no personal or protected page is indexed; those pages are detected by properties in the HTTP header (like Cookie-Use, or HTTP Authorization) or by POST-Parameters (either in URL or as HTTP protocol) and automatically excluded from indexing.

You have to setup the proxy before use.

Proxy Auto Config: this controls the proxy auto configuration script for browsers at http://localhost:8090/autoconfig.pac
whether the proxy should only be used for .yacy-Domains
Proxy pre-fetch setting: this is an automated html page loading procedure that takes actual proxy-requested URLs as crawling start points for crawling.
A prefetch of 0 means no prefetch; a prefetch of 1 means to prefetch all embedded URLs, but since embedded image links are loaded by the browser this means that only embedded href-anchors are prefetched additionally.
It is almost always recommended to set this on. The only exception is that you have another caching proxy running as secondary proxy and YaCy is configured to used that proxy in proxy-proxy - mode.
If this is on, all pages (except private content) that passes the proxy is indexed.
This is the same as for Local Text-Indexing, but switches only the indexing of media content on.
If checked, the crawler will contact other peers and use them as remote indexers for your crawl. If you need your crawling results locally, you should switch this off. Only senior and principal peers can initiate or receive remote crawls. Please note that this setting only take effect for a prefetch depth greater than 0.
Proxy generally
The path where the pages are stored (max. length 300)
The size in MB of the cache.
 
#(info)# ::

The file DATA/PLASMADB/crawlProfiles0.db is missing or corrupted. Please delete that file and restart.

::

Pre-fetch is now set to depth-#[message]#.

Caching is now #(caching)#off::on#(/caching)#.

Local Text Indexing is now #(indexingLocalText)#off::on#(/indexingLocalText)#.

Local Media Indexing is now #(indexingLocalMedia)#off::on#(/indexingLocalMedia)#.

Remote Indexing is now #(indexingRemote)#off::on#(/indexingRemote)#.

#(path)#::

Cachepath is now set to '#[return]#'. Please move the old data in the new directory.

#(/path)# #(size)#::

Cachesize is now set to #[return]#MB.

#(/size)# #(restart)#::

Changes will take effect after restart only.

#(/restart)# ::

An error has occurred: #[error]#.

#(/info)#

You can see a snapshot of recently indexed pages on the Proxy Index Monitor Page.

#%env/templates/footer.template%#