yacy_search_server/source/net/yacy/crawler
luccioman 7baa99f26f Fixed stored URL in web cache when redirection(s) occurs.
Associate cached content to the last redirection location, instead of
the first URL of a redirection(s) chain :
 - for proper base URL processing in parsers (fixes mantis 636 -
http://mantis.tokeek.de/view.php?id=636)
 - to prevent duplicated content in Solr index when recrawling a
redirected URL
2018-01-20 18:56:40 +01:00
..
data Added optional https support for remote crawl and profile operations 2017-12-21 18:41:32 +01:00
retrieval Fixed stored URL in web cache when redirection(s) occurs. 2018-01-20 18:56:40 +01:00
robots Do locale independant case conversion on hosts, schemes, and file exts. 2017-12-19 13:52:05 +01:00
Balancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
CrawlStacker.java Removed unncessary reflection usage for workflow tasks. 2018-01-15 10:05:49 +01:00
CrawlStarterFromScraper.java Updated a license header typo. 2017-10-30 07:38:47 +01:00
CrawlSwitchboard.java Added new recrawl job profile to the list of default crawl profiles 2018-01-15 08:30:37 +01:00
FileCrawlStarterTask.java fix typo 2017-10-27 14:00:30 +02:00
HarvestProcess.java fix for wrong display of error urls in HostBrowser 2012-12-07 00:31:10 +01:00
HostBalancer.java Fixed display of crawler pending URLs counts in HostBrowser.html page. 2017-01-22 12:31:14 +01:00
HostQueue.java to prevent crawler to concurrently access and alter same crawl queue 2016-07-05 23:22:35 +02:00
IllegalCrawlProfileException.java Crawl from local file : faster task end when manually terminating crawl. 2016-10-22 09:11:20 +02:00
LegacyBalancer.java use supplied url port to get robots.txt in crawlers hostqueue 2016-03-02 00:12:34 +01:00
RecrawlBusyThread.java Added a specific default crawl profile for the recrawl job. 2018-01-13 15:46:04 +01:00