yacy_search_server/source/de/anomic/crawler
orbiter 1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
..
AbstractImporter.java refactoring: 2008-05-06 13:44:38 +00:00
Balancer.java - refactoring of robots parser 2008-07-05 00:35:20 +00:00
CrawlEntry.java fix for npe in crawler 2008-05-08 20:16:19 +00:00
CrawlProfile.java Major update to BLOB data structures: 2008-07-10 00:47:37 +00:00
CrawlQueues.java - refactoring of robots parser 2008-07-05 00:35:20 +00:00
CrawlStacker.java - refactoring of robots parser 2008-07-05 00:35:20 +00:00
ErrorURL.java refactoring: moved all crawler classes into their own package 2008-05-06 00:32:41 +00:00
FTPLoader.java - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet), 2008-05-14 21:36:02 +00:00
HTTPLoader.java Major update to BLOB data structures: 2008-07-10 00:47:37 +00:00
Importer.java refactoring: 2008-05-06 13:44:38 +00:00
ImporterException.java refactoring: 2008-05-06 13:44:38 +00:00
ImporterManager.java - organize imports 2008-06-06 16:01:27 +00:00
IndexingStack.java added options to configure the 'corporate identity'-icons, the home page link and the greeting line from 2008-07-03 23:37:04 +00:00
LoaderMessage.java refactoring: moved all crawler classes into their own package 2008-05-06 00:32:41 +00:00
NoticedURL.java - modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order 2008-07-03 13:08:37 +00:00
NoticeURLImporter.java - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet), 2008-05-14 21:36:02 +00:00
ProtocolLoader.java more protection against remote shutdown attacks: prevent loading using the crawler 2008-05-19 23:05:19 +00:00
ResourceObserver.java - added copyright header of ResourceObserver 2008-07-07 00:40:45 +00:00
ResultImages.java refactoring: moved all crawler classes into their own package 2008-05-06 00:32:41 +00:00
ResultURLs.java some code-cleanup and possible speed enhancements in different core methods 2008-06-17 23:56:39 +00:00
robotsParser.java Major update to BLOB data structures: 2008-07-10 00:47:37 +00:00
RobotsTxt.java Major update to BLOB data structures: 2008-07-10 00:47:37 +00:00
SitemapImporter.java refactoring: 2008-05-06 13:44:38 +00:00
ZURL.java - refactoring of robots parser 2008-07-05 00:35:20 +00:00