yacy_search_server/source
Michael Peter Christen e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
..
net/yacy enhanced crawler 2021-08-17 15:23:21 +02:00
org/json modified org.json.* library to fit into the YaCy environment 2020-04-24 11:42:06 +02:00
log4j.properties