yacy_search_server/source/de/anomic/crawler
orbiter 70dd26ec95 added the new crawl scheduling function to the crawl start menu:
- the scheduler extends the option for re-crawl timing. Many people misunderstood the re-crawl timing feature because that was just a criteria for the url double-check and not a scheduler. Now the scheduler setting is combined with the re-crawl setting and people will have the choice between no re-crawl, re-crawl as was possible so far and a scheduled re-crawl. The 'classic' re-crawl time is set automatically when the scheduling function is selected
- removed the bookmark-based scheduler. This scheduler was not able to transport all attributes of a crawl start and did therefore not support special crawling starts i.e. for forums and wikis
- since the old scheduler was not aber to crawl special forums and wikis, the must-not-match filter was statically fixed to all bad pages for these special use cases. Since the new scheduler can handle these filters, it is possible to remove the default settings for the filters
- removed the busy thread that was used to trigger the bookmark-based scheduler
- removed the crontab for the bookmark-based scheduler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-08-19 23:52:38 +00:00
..
retrieval - better url double check in crawler 2010-08-11 09:54:18 +00:00
AbstractImporter.java - cleanup, removed unused imports 2010-04-27 21:47:41 +00:00
Balancer.java redirect uncaught exceptions to logging + small other changes 2010-08-16 12:33:06 +00:00
CrawlProfile.java added the new crawl scheduling function to the crawl start menu: 2010-08-19 23:52:38 +00:00
CrawlQueues.java - better url double check in crawler 2010-08-11 09:54:18 +00:00
CrawlStacker.java fixed crawler bug caused by NPE in logging 2010-08-12 01:29:56 +00:00
CrawlSwitchboard.java added the new crawl scheduling function to the crawl start menu: 2010-08-19 23:52:38 +00:00
Importer.java
ImporterException.java
ImporterManager.java *) some minor changes for better code readability 2010-04-05 12:37:33 +00:00
Latency.java better (and corrected) recognition of intranet and internet-addresses. This corrects the isLocal property that is used by network definitions to restrict index ranges to local and global addresses. Address locations (intranet or internet) had been partly identified by the top level domain of the host address. Since intranet addresses can also be addressed using a host name that is in a country domain it is necessary to do a dns resolving for each check. The check is supported by a local dns cache so the intranet/internet check should not affect network traffic too much. To ensure that the cache works properly the cache class was upgraded to better concurrency data structures. 2010-07-18 20:14:20 +00:00
NoticedURL.java - better url double check in crawler 2010-08-11 09:54:18 +00:00
ResourceObserver.java allow global search if res. observer disabled index transmission 2010-02-09 17:14:16 +00:00
ResultImages.java redesign of parser interface: 2010-06-29 19:20:45 +00:00
ResultURLs.java - more abstraction (HashMap -> Map) 2010-06-01 13:02:11 +00:00
RobotsEntry.java redesign of remote proxy settings 2010-05-26 00:01:16 +00:00
robotsParser.java - fixed a bug in robots.txt parser 2010-03-04 11:58:07 +00:00
RobotsTxt.java ... migrating to HttpComponents-Client-4.x ... 2010-08-10 21:22:30 +00:00
SitemapImporter.java applied code changes that are recommended by PMD 2010-01-10 23:09:48 +00:00
ZURL.java fixed crawler bug caused by NPE in logging 2010-08-12 01:29:56 +00:00