Commit Graph

9 Commits

Author SHA1 Message Date
luccioman
098ee63911 Added a manual performance test for the HostBalancer.
Consequently to the report in mantis 776
(http://mantis.tokeek.de/view.php?id=776).

Running the perfs test with different control parameters seems to reveal
that the YaCy's RowHandleMap used in the balancer depthCache is finally
more efficient than for example the ConcurrentHashMap from JDK 8.
2018-01-28 12:41:56 +01:00
luccioman
46b5249c20 Removed time condition on HostBalancer initialization in JUnit test.
Its initialization in main application usage remains asynchronous.
2018-01-26 17:15:27 +01:00
luccioman
9dd790087d Added HT Cache basic statistics (hit rate) 2017-06-15 09:50:02 +02:00
luccioman
28b451a0b3 Made Cache compression level and lock timeout user configurable 2017-06-14 19:02:08 +02:00
luccioman
a7394b479b Limit the synchronization blocking time on some Cache operations.
Using a Reentrant lock instead of the intrinsic synchronization lock
permits limiting the blocking time to acquire a lock.

Useful on a very busy Cache concurrently accessed by many threads : when
the time to acquire a lock is too high, getting/storing content on the
cache becomes inefficient, and it is then better to fall back to loading
remote resources.

Illustrated by the CacheTest stress test and some traces reported in
mantis 751 ( http://mantis.tokeek.de/view.php?id=751 )
2017-06-14 09:13:50 +02:00
luccioman
aa9ddf3c23 Added control over Robots.txt active threads maximum number.
When starting a crawl from a file containing thousands of links,
configuration setting "crawler.MaxActiveThreads" is effective to prevent
saturating the system with too many outgoing HTTP connections threads
launched by the crawler.
But robots.txt was not affected by this setting and was indefinitely
increasing the number of concurrently loading threads until most ot the
connections timed out.

To improve performance control, added a pool of threads for Robots.txt,
consistently used in its ensureExist() and massCrawlCheck() methods.
The Robots.txt threads pool max size can now be configured in the
/PerformanceQueus_p.html page, or with the new
"robots.txt.MaxActiveThreads" setting, initialized with the same default
value as the crawler.
2016-11-23 18:13:05 +01:00
reger
7b226afc33 fix HostQueueTest - changed open parameter 2016-07-06 23:52:02 +02:00
reger
fcc29c36f0 test case for HostBalancer issue in intranet mode
with file:// protocol, 2 hostqueues accessing same cache file concurrently
http://mantis.tokeek.de/view.php?id=668
Reason seems to be diff. hosthash key of hostqueues on reopen. 
Internal queue key and external representation (directoryname currently hostname.port) must be adjusted to fix it (not done yet).
2016-07-04 02:44:58 +02:00
reger
84c970eaec move test classes to test/java (subdirectory as in Maven standard subdir layout)
because ViewImage*Test.java breaks test run
2016-01-16 19:22:27 +01:00