yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

History

orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu: - the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected - removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis - since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters - removed the busy thread that was used to trigger the bookmark-based scheduler - removed the crontab for the bookmark-based scheduler git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542		2010-08-19 23:52:38 +00:00
..
retrieval	- better url double check in crawler	2010-08-11 09:54:18 +00:00
AbstractImporter.java	- cleanup, removed unused imports	2010-04-27 21:47:41 +00:00
Balancer.java	redirect uncaught exceptions to logging + small other changes	2010-08-16 12:33:06 +00:00
CrawlProfile.java	added the new crawl scheduling function to the crawl start menu:	2010-08-19 23:52:38 +00:00
CrawlQueues.java	- better url double check in crawler	2010-08-11 09:54:18 +00:00
CrawlStacker.java	fixed crawler bug caused by NPE in logging	2010-08-12 01:29:56 +00:00
CrawlSwitchboard.java	added the new crawl scheduling function to the crawl start menu:	2010-08-19 23:52:38 +00:00
Importer.java
ImporterException.java
ImporterManager.java	*) some minor changes for better code readability	2010-04-05 12:37:33 +00:00
Latency.java	better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures.	2010-07-18 20:14:20 +00:00
NoticedURL.java	- better url double check in crawler	2010-08-11 09:54:18 +00:00
ResourceObserver.java	allow global search if res. observer disabled index transmission	2010-02-09 17:14:16 +00:00
ResultImages.java	redesign of parser interface:	2010-06-29 19:20:45 +00:00
ResultURLs.java	- more abstraction (HashMap -> Map)	2010-06-01 13:02:11 +00:00
RobotsEntry.java	redesign of remote proxy settings	2010-05-26 00:01:16 +00:00
robotsParser.java	- fixed a bug in robots.txt parser	2010-03-04 11:58:07 +00:00
RobotsTxt.java	... migrating to HttpComponents-Client-4.x ...	2010-08-10 21:22:30 +00:00
SitemapImporter.java	applied code changes that are recommended by PMD	2010-01-10 23:09:48 +00:00
ZURL.java	fixed crawler bug caused by NPE in logging	2010-08-12 01:29:56 +00:00