Commit Graph

18 Commits

Author SHA1 Message Date
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
karlchenofhell
bf7a69197d - fix for possible NPE in queues_p
- WatchCrawler_p:
  - display crawler traffic
  - pause/resume local- and global crawler


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-22 22:26:11 +00:00
allo
0c81bd39d4 XSS-safe put as default.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3217 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 14:07:54 +00:00
karlchenofhell
41bc31d2c2 - ConfigAdvanced_p => XHTML (no invalid IDs)
- removed unmappable characters from code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3133 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-23 13:35:34 +00:00
orbiter
1d2d1854b9 added size of rwi and urls to WatchCrawler
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3112 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-21 21:33:35 +00:00
orbiter
61798f0ae6 added option to distinguish between text crawl and media crawl
- for each crawl start, there is now a flag for text and media
- the localCrawl flag is superfluous
- added new crawl profiles
- if an image search is done, only media links are crawled for the snippets


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-19 03:10:46 +00:00
orbiter
febe6b114a design update of crawler monitor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-18 01:18:28 +00:00
orbiter
109ed0a0bb - cleaned up code; removed methods to write the old data structures
- added an assortment importer. the old database structures can
  be imported with
  java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
  to the collection database. The call is
  java -classpath classes yacy -migratewords
  (as it was)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 02:47:51 +00:00
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
orbiter
5015e780c2 - simplified watchCrawler code
- changed display of watchCrawler slightly

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2594 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-15 13:54:10 +00:00
theli
413e6b9855 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2489 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:56:49 +00:00
theli
eb9b138986 *) next step of restructuring for new crawlers
- conversion of the crawler pool into a keyed object pool
   - crawlers are now loaded based on the url protocol (of course works only for http now)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:52:55 +00:00
theli
1395aae742 *) starting restructuring which is needed to add crawlers for additional protocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:09:20 +00:00
allo
933a9e02ab fix for broken build
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-09 14:35:20 +00:00
allo
360056b30c fix ajax bug (no valid xml)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2283 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-09 10:59:55 +00:00
allo
3fd1641893 queuesizes in queues_p.xml
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1714 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-20 22:48:39 +00:00
allo
26d7e8dd0d more escapes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1677 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 22:19:04 +00:00
allo
127396436f more queues in the xml backend
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1674 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-02-16 20:26:10 +00:00