yacy_search_server/source/net/yacy/crawler
luccioman b712a0671e Added a specific default crawl profile for the recrawl job.
- with only light constraint on known indexed documents load date, as it
can already been controlled by the selection query, and the goal of the
job is indeed to recrawl selected documents now
- using the iffresh cache strategy
2018-01-13 15:46:04 +01:00
..
data Added optional https support for remote crawl and profile operations 2017-12-21 18:41:32 +01:00
retrieval Do locale independant case conversion on hosts, schemes, and file exts. 2017-12-19 13:52:05 +01:00
robots Do locale independant case conversion on hosts, schemes, and file exts. 2017-12-19 13:52:05 +01:00
Balancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
CrawlStacker.java More comprehensive log on rejected recrawls caused by date constraint 2018-01-13 12:07:56 +01:00
CrawlStarterFromScraper.java Updated a license header typo. 2017-10-30 07:38:47 +01:00
CrawlSwitchboard.java Added a specific default crawl profile for the recrawl job. 2018-01-13 15:46:04 +01:00
FileCrawlStarterTask.java fix typo 2017-10-27 14:00:30 +02:00
HarvestProcess.java fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
HostBalancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
HostQueue.java to prevent crawler to concurrently access and alter same crawl queue 2016-07-05 23:22:35 +02:00
IllegalCrawlProfileException.java Crawl from local file : faster task end when manually terminating crawl. 2016-10-22 09:11:20 +02:00
LegacyBalancer.java use supplied url port to get robots.txt in crawlers hostqueue 2016-03-02 00:12:34 +01:00
RecrawlBusyThread.java Added a specific default crawl profile for the recrawl job. 2018-01-13 15:46:04 +01:00