yacy_search_server/source/net/yacy/crawler
luccioman 3f0446f14b Ensure proper synchronous robots entry retrieval on first check.
Previously, when checking for the first time the robots.txt policy on a
unknown host (not cached in the robots table), result was always empty
in the /getpageinfo_p.xml api and in the /CrawlCheck_p.html page. Next
calls returned however the correct information.
2017-08-16 09:30:33 +02:00
..
data Added HT Cache basic statistics (hit rate) 2017-06-15 09:50:02 +02:00
retrieval Support parsing gzip files from servers with redundant headers. 2017-07-16 14:46:46 +02:00
robots Ensure proper synchronous robots entry retrieval on first check. 2017-08-16 09:30:33 +02:00
Balancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
CrawlStacker.java Factored code re-implementing DigestURL.hosthash() method. 2017-01-16 10:18:42 +01:00
CrawlStarterFromSraper.java Advanced Crawl from local file : better processing of large files. 2016-10-21 13:03:31 +02:00
CrawlSwitchboard.java remove wrong log line in CrawlSwitchboard 2016-07-02 20:33:23 +02:00
FileCrawlStarterTask.java Crawl from local file : faster task end when manually terminating crawl. 2016-10-22 09:11:20 +02:00
HarvestProcess.java fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
HostBalancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
HostQueue.java to prevent crawler to concurrently access and alter same crawl queue 2016-07-05 23:22:35 +02:00
IllegalCrawlProfileException.java Crawl from local file : faster task end when manually terminating crawl. 2016-10-22 09:11:20 +02:00
LegacyBalancer.java use supplied url port to get robots.txt in crawlers hostqueue 2016-03-02 00:12:34 +01:00
RecrawlBusyThread.java init Recrawl job chunk size to max crawl loader during job start, to use some system preferences 2015-10-16 03:05:39 +02:00