..
data
Fixed display of crawler pending URLs counts in HostBrowser.html page.
2017-01-22 12:31:14 +01:00
retrieval
further avoid to set connect info properties as header value
2017-03-04 22:45:17 +01:00
robots
Added control over Robots.txt active threads maximum number.
2016-11-23 18:13:05 +01:00
Balancer.java
Fixed display of crawler pending URLs counts in HostBrowser.html page.
2017-01-22 12:31:14 +01:00
CrawlStacker.java
Factored code re-implementing DigestURL.hosthash() method.
2017-01-16 10:18:42 +01:00
CrawlStarterFromSraper.java
Advanced Crawl from local file : better processing of large files.
2016-10-21 13:03:31 +02:00
CrawlSwitchboard.java
remove wrong log line in CrawlSwitchboard
2016-07-02 20:33:23 +02:00
FileCrawlStarterTask.java
Crawl from local file : faster task end when manually terminating crawl.
2016-10-22 09:11:20 +02:00
HarvestProcess.java
fix for wrong display of error urls in HostBrowser
2012-12-07 00:31:10 +01:00
HostBalancer.java
Fixed display of crawler pending URLs counts in HostBrowser.html page.
2017-01-22 12:31:14 +01:00
HostQueue.java
to prevent crawler to concurrently access and alter same crawl queue
2016-07-05 23:22:35 +02:00
IllegalCrawlProfileException.java
Crawl from local file : faster task end when manually terminating crawl.
2016-10-22 09:11:20 +02:00
LegacyBalancer.java
use supplied url port to get robots.txt in crawlers hostqueue
2016-03-02 00:12:34 +01:00
RecrawlBusyThread.java
init Recrawl job chunk size to max crawl loader during job start, to use some system preferences
2015-10-16 03:05:39 +02:00