yacy_search_server/source/de/anomic/crawler
orbiter 2c549ae341 fixed a number of small bugs:
- better crawl star for files paths and smb paths
- added time-out wrapper for dns resolving and reverse resolving to prevent blockings
- fixed intranet scanner result list check boxes
- prevented htcache usage in case of file and smb crawling (not necessary, documents are locally available)
- fixed rss feed loader
- fixes sitemap loader which had not been restricted to single files (crawl-depth must be zero)
- clearing of crawl result lists when a network switch was done
- higher maximum file size for crawler

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2010-09-30 23:57:58 +00:00
..
retrieval fixed a number of small bugs: 2010-09-30 23:57:58 +00:00
Balancer.java redesign of crawl profiles data structure. target will be: 2010-08-31 15:47:47 +00:00
CrawlProfile.java fixed a number of small bugs: 2010-09-30 23:57:58 +00:00
CrawlQueues.java enhanced remote crawling: 2010-09-16 09:34:17 +00:00
CrawlStacker.java replaced auto-dom filter with easy-to-understand Site Link-List crawler option 2010-09-30 12:50:34 +00:00
CrawlSwitchboard.java fixed a number of small bugs: 2010-09-30 23:57:58 +00:00
ImporterException.java
Latency.java preparations to move the HTCache into cora: 2010-08-23 12:32:02 +00:00
NoticedURL.java redesign of crawl profiles data structure. target will be: 2010-08-31 15:47:47 +00:00
ResourceObserver.java change in handling of the all-visible home path for storage in YaCy: 2010-09-02 19:24:22 +00:00
ResultImages.java redesign of parser interface: 2010-06-29 19:20:45 +00:00
ResultURLs.java fixed a number of small bugs: 2010-09-30 23:57:58 +00:00
RobotsEntry.java - replaced pdfbox and fontbox version 1.1.0 with 1.2.1 2010-09-07 17:13:47 +00:00
robotsParser.java enhanced computation speed of many replaceAll string operations 2010-09-05 13:19:42 +00:00
RobotsTxt.java - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed 2010-09-27 14:54:32 +00:00
RSSLoader.java - added nice colors to feed indexing state messages 2010-08-27 11:56:51 +00:00
SitemapImporter.java - moved yacybot user agent string definition to MultiProtocolURI since there are basic access mechanisms where the bot string is needed 2010-09-27 14:54:32 +00:00
ZURL.java fixed crawler bug caused by NPE in logging 2010-08-12 01:29:56 +00:00