yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

History

luccioman 3f0446f14b Ensure proper synchronous robots entry retrieval on first check. Previously, when checking for the first time the robots.txt policy on a unknown host (not cached in the robots table), result was always empty in the /getpageinfo_p.xml api and in the /CrawlCheck_p.html page. Next calls returned however the correct information.		2017-08-16 09:30:33 +02:00
..
data	Added HT Cache basic statistics (hit rate)	2017-06-15 09:50:02 +02:00
retrieval	Support parsing gzip files from servers with redundant headers.	2017-07-16 14:46:46 +02:00
robots	Ensure proper synchronous robots entry retrieval on first check.	2017-08-16 09:30:33 +02:00
Balancer.java	Fixed display of crawler pending URLs counts in HostBrowser.html page.	2017-01-22 12:31:14 +01:00
CrawlStacker.java	Factored code re-implementing DigestURL.hosthash() method.	2017-01-16 10:18:42 +01:00
CrawlStarterFromSraper.java	Advanced Crawl from local file : better processing of large files.	2016-10-21 13:03:31 +02:00
CrawlSwitchboard.java	remove wrong log line in CrawlSwitchboard	2016-07-02 20:33:23 +02:00
FileCrawlStarterTask.java	Crawl from local file : faster task end when manually terminating crawl.	2016-10-22 09:11:20 +02:00
HarvestProcess.java	fix for wrong display of error urls in HostBrowser	2012-12-07 00:31:10 +01:00
HostBalancer.java	Fixed display of crawler pending URLs counts in HostBrowser.html page.	2017-01-22 12:31:14 +01:00
HostQueue.java	to prevent crawler to concurrently access and alter same crawl queue	2016-07-05 23:22:35 +02:00
IllegalCrawlProfileException.java	Crawl from local file : faster task end when manually terminating crawl.	2016-10-22 09:11:20 +02:00
LegacyBalancer.java	use supplied url port to get robots.txt in crawlers hostqueue	2016-03-02 00:12:34 +01:00
RecrawlBusyThread.java	init Recrawl job chunk size to max crawl loader during job start, to use some system preferences	2015-10-16 03:05:39 +02:00