yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

History

Michael Peter Christen e6a87e0426 enhanced crawler a main problem when crawling is long waiting time cuased by crawl-delay values from robots.txt entries. that attribute is not supported by google and interpreted by yandex and bing in different ways. In large crawls there is always one host which blocks the whole crawl with extreme large values. YaCy now still obeys crawl-delay but limits them to 10 seconds. Additionally the blocking logic when loading new robots.txt was analyzed and a deadlock was removed. Furthermore the construction of new queue lists was redesigned and it was ensured that always a large list of different hosts for host-balancing is provided for the loader.		2021-08-17 15:23:21 +02:00
..
net/yacy	enhanced crawler	2021-08-17 15:23:21 +02:00
org/json	modified org.json.* library to fit into the YaCy environment	2020-04-24 11:42:06 +02:00
log4j.properties