Commit Graph

1738 Commits

Author SHA1 Message Date
orbiter
96c6e4e322 - enhancements to detailed search page
- enhancements to search ranking computation process
- removed bugs in postranking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:26:06 +00:00
orbiter
9340dbb501 fixed all possible problems with nullpointer exception for LURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2513 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 18:24:39 +00:00
theli
a5ed86105b *) bugfix for handling of ResourceInfo object in proxy
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2512 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 15:50:45 +00:00
hermens
ff4362b02d some more fixes for new plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 14:32:46 +00:00
hermens
7aeadbe7cc another NullPointerException in http.ResourceInfo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2510 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 13:13:19 +00:00
orbiter
141f9e5bb4 fix for new plasmaCrawlLURL.load behavior
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 12:27:32 +00:00
orbiter
1e7fd48afd added size method to ftpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2508 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 12:21:41 +00:00
hermens
087f7511f8 prevent NullPointerException in http.ResourceInfo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2507 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 11:46:22 +00:00
orbiter
a2525072f2 bugfix for kelondroRow - property generation
this bug affected ranking parameters :-(

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 10:55:34 +00:00
hydrox
59a5511dbb *) added missing static Strings as requested by theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2505 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 07:20:28 +00:00
theli
6578564c9a *) Ignore more hop by hop http headers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2504 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:38:35 +00:00
theli
b44514242a *) crawler/ftp/CrawlWorker.java: better errorhandling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:22:35 +00:00
theli
7d7f30139c *) crawler/ftp/CrawlWorker.java: delete old cache file
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:08:35 +00:00
theli
4ae0f122f8 *) ResourceInfo.java: License header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 05:01:07 +00:00
theli
043edfa4d8 *) ftp/ResourceInfo.java ResourceInfo object for ftp resources added
*) ftp/CrawlWorker.java better errorhandling for ftp crawler
*) plasmaCrawlEURL.java: some errorcodes added

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 04:12:52 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
orbiter
8a0e35618b enhancements to search result preparation
- added detailed count on remote search results
- enhanced search sequence during remote searches (doing local search in sequence)
- strict adherence to timout limits

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2497 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-06 17:51:28 +00:00
theli
5c1bb53d2a Missing description for last commit
*) next step of restructuring for new crawlers
   > HTCaching should now work protocol independent
   -- introduction of new ResourceInfo objects containing protocolspecific metadata
      of a resource. 
   -- the ResourceInfo objects now implement old functions like shallIndexCacheForXXX, 
      shallStoreCacheForXXX in a protocol dependent manner   
   > Indexing should also work protocol independent now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-06 14:35:45 +00:00
theli
dae763d8e3 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2495 6c8d7289-2bf4-0310-a012-ef5d649a1542 2006-09-06 14:31:17 +00:00
theli
4825bfaaf3 *) Bugfix for PrintWriter Problem
See: http://www.yacy-forum.de/viewtopic.php?t=2792

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2494 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 18:55:45 +00:00
orbiter
d4c5e2af01 html-dirlist can now also be generated from existing connections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2493 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 10:11:07 +00:00
theli
7930839594 *) URL.java: userinfo was not taken over when generating a new url from a base url and a rel. path
*) CrawlWorker.java: using new dirhtml function of ftpc

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2492 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 05:17:57 +00:00
orbiter
17ba468165 added html dirlisting generation in ftpc.java:
ftpc.dirhtml() generates a StringBuffer with a complete web page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2491 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-05 00:11:59 +00:00
theli
7a35b8e237 *) direct access to responseheaders of sbQueue.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:36:19 +00:00
theli
ffbf416e76 *) direct access to requestheader of htCache.Entry removed to make it more http independent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2486 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:29:45 +00:00
theli
3870d615e3 *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:06:58 +00:00
theli
393a7d10be *) setting htCache.Entry fields to private
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2484 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 15:03:54 +00:00
theli
ab5a9bee66 *) adding some copyright headers
*) next step of restructuring for new crawlers
   - adding first testversion of ftp crawler class
   -- does not create a htCache entry yet

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 14:38:29 +00:00
theli
5847492537 *) next step of restructuring for new crawlers
- IndexCreate_p.java: correcting problems with ftp urls
   - URL.java does not cutout the userinfo anymore 
    (needed to transport authentication info in ftp urls, e.g. ftp://username:pwd@ftp.irgendwas.de)
   - plasmaCrawlLoader.java: 
   -- hack to re enable https urls
   -- adding function getSupportedProtocols

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2482 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 13:17:11 +00:00
orbiter
6cce47e217 test of ftp-urls in URL class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2481 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 13:10:40 +00:00
theli
fce9e7741b *) next step of restructuring for new crawlers
- renaming of http specific crawler settings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2480 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:56:47 +00:00
theli
e3f0136606 *) next step of restructuring for new crawlers
- adding function isSupportedProcotol to plasmaCrawlLoader.java
   - disabling robots.txt check for protocols other than http(s)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:46:17 +00:00
theli
9ded4e8d5a *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:26:53 +00:00
theli
1c8300fcec *) Bugfix for name resolution in proxy mode
See: http://www.yacy-forum.de/viewtopic.php?p=25241

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 11:23:57 +00:00
theli
4e2a950ac9 *) next step of restructuring for new crawlers
- avoid using the http crawler class directly. Using the interface class instead

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:24:24 +00:00
theli
09b106eb04 *) next step of restructuring for new crawlers
- adding interface class (plasma/crawler/plasmaCrawlWorker.java) for protocol specific crawl-worker threads 
   - moving reusable code into abstract crawl-worker class AbstractCrawlWorker.java
   - the load method of the worker threads should not be called directly anymore (e.g. by the snippet fetcher)
     to crawl a page and wait for the result use function plasmaCrawlLoader.loadSync([...])

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 09:00:18 +00:00
theli
eb9b138986 *) next step of restructuring for new crawlers
- conversion of the crawler pool into a keyed object pool
   - crawlers are now loaded based on the url protocol (of course works only for http now)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:52:55 +00:00
theli
1395aae742 *) starting restructuring which is needed to add crawlers for additional protocols
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2472 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 06:09:20 +00:00
theli
b4acbdaa97 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?p=25234

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-04 05:17:37 +00:00
theli
f3ac4dbbb9 *) better handling of server shutdown
See: e.g. http://www.yacy-forum.de/viewtopic.php?t=2584

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 14:59:00 +00:00
theli
959b779aba *) avoid performance loss if log level is greater than 'fine'
See: http://www.yacy-forum.de/viewtopic.php?p=25180

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-03 08:42:46 +00:00
auron_x
57dda1a92c *)again fixing for wrong version display, now totally working with double instead of float
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2464 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-28 17:54:07 +00:00
auron_x
479b74e1dd *) fix for stupid mistake in new ppm-calc which caused decimal digits beeing written to seedinfo
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-28 04:43:28 +00:00
auron_x
348258a557 *) changed PPM-calculation to be much more accurate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-27 17:18:34 +00:00
orbiter
18b6876860 new cache flush configuration settings
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 22:31:21 +00:00
hermens
f0278b4092 Bugfix for / by zero when the AssortmentCluster is empty
See: http://www.yacy-forum.de/viewtopic.php?t=2746



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 20:23:04 +00:00
orbiter
14e0bb0dcf allow more references per word for new db
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-25 12:06:23 +00:00
orbiter
985dcbde7f changed some parameters that may cause better memory usage and more indexing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 23:39:52 +00:00
orbiter
b7f4a1521b added options to switch on or off the kelondroFlexTable for NURL, EURL and PreNURL
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 22:21:22 +00:00
orbiter
c26da4893b turned back NURL usage of kelondroTree, kelondroFlexTable has still problems with deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2454 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 10:03:38 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
hermens
0b73f2b132 Repair DNS prefetch during cacheScan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2451 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 01:31:08 +00:00
orbiter
27a159b401 * documentation update
* removed doc from release
* release information in doc/News.html
* release 0.46

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-23 11:36:09 +00:00
theli
f80f776b89 *) Trying to solve NullpointerException problem in function addURLtoErrorDB
See: http://www.yacy-forum.de/viewtopic.php?t=2705

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-23 10:23:20 +00:00
orbiter
d78b824e85 fixed problem with default path after first start-up
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-22 13:35:51 +00:00
hydrox
1c99b5a484 *)fixed logging for urldbcleanup
*)changed exception handling in urldbcleanup so that it shows NullPointerException correctly
*)added more Blacklisting to urlcleaner

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 06:42:42 +00:00
orbiter
135e019883 removed one superfluous line from last commit
(hasnot is included in remove)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2435 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 01:59:44 +00:00
orbiter
1591a55963 added object cache miss-cache use for remove method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 01:51:27 +00:00
orbiter
8f3f4ab0eb enhanced synchronisation in plasmaWordIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 01:29:26 +00:00
orbiter
f933f00f09 another patch to URL protocol handling for 'news', 'nntp' etc:
reject it! (the java.net.URL class rejects them too)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 01:04:04 +00:00
orbiter
4c6e00d80a more bugfixes for URL class, see:
http://www.yacy-forum.de/viewtopic.php?p=24844#24844

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-21 00:23:39 +00:00
orbiter
23dd972608 fixed memory calculation in performanceMemory web page
fixed also maximum cache size computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-20 01:20:34 +00:00
orbiter
b7dc251948 fixed bugs in url class:
- correct backpath ('..') handling
- correct absolute path handling
- included https


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 22:27:01 +00:00
orbiter
1ce3c22761 better memory control:
- added memory monitor for preNURL-db in performanceMemory
- changed default memory assignments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 13:09:04 +00:00
orbiter
39b4c26bdc more memory control:
- catchup of OutOfMemoryError in server threads
- automatic adoption of word cache size after a Short Mem Cycle

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-19 00:06:39 +00:00
orbiter
3e9d509c39 some small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 22:50:05 +00:00
orbiter
276225d79e fix for URL class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 21:33:00 +00:00
orbiter
eb633c0a4f server threads must now supply a method that can be called in case
of short memory. This has been realized for the indexing thread.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2421 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 02:07:03 +00:00
orbiter
f5720cb2fa removed most synchronization in wordIndex (for testing)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 01:35:33 +00:00
orbiter
0187c60010 because of a bug in the JRE 1.4.2 there was no memory protection
see http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4686462
this commit fixes the bug by using a memory-computation patch.
All uses of Runtime.maxMemory had been replaced by serverMemory.max
The bug is not present any more in Java 1.5

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-18 01:33:54 +00:00
auron_x
4eca0f8830 *) fixed PPM calculation for multiple indexer-threads
*) fixed totalPPM calculation and added total PPM to Network.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2418 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-17 19:15:30 +00:00
orbiter
cfb51fdef1 less synchronization in plasmaWordIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-17 00:10:50 +00:00
orbiter
d6a928c2da quickfix for http://www.yacy-forum.de/viewtopic.php?t=2705
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 23:20:21 +00:00
orbiter
6ad471ef96 * applied many compiler warning recommendations
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 19:49:31 +00:00
allo
cf1186597b utf fix from theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 15:26:04 +00:00
hydrox
9da3aa74d3 silly me, fix for the fix as advised by theli
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2408 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 17:26:32 +00:00
hydrox
bb3d9a5582 *) e.getMessage().indexOf() can only be used if there is actually an ExceptionMessage.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 17:18:09 +00:00
hydrox
7a54010a9c *) Iterators can't be casted to IndexContainer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 17:08:39 +00:00
theli
5e0b6f8f83 *) sorting peer name list on Blacklist_p.html
*) restructuring of sharedBlacklist_p.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-13 13:29:50 +00:00
orbiter
cd5f7e137c fixed problem with NURL-generation upon first startup
(a new kelondroFlexTable was generated, which should not)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 23:24:50 +00:00
orbiter
8418af141a added several consistency checks and small changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 15:59:14 +00:00
theli
9d13aeca13 *) removing class. does not work so far
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:43:07 +00:00
theli
95a84ae469 *) adding missing classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:41:26 +00:00
theli
eee44be602 *) adding an interface for customized blacklist classes
- now it's possible to use a customized blacklist engine
     instead of the default one
   - this can be done by configuring the property BlackLists.class
   See: http://www.yacy-forum.de/viewtopic.php?t=2108

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 14:28:14 +00:00
orbiter
6d2f15971a there is a very strange error that causes that the kelondroRecords structure
is corrupted. The cause is, that the deleted-records-chain has wrong entries,
and one of the pointers in that chain points to a place behind the file end.
This causes an IndexOutOfBoundsException within an IO operation.
I currently don't know the reason that the deleted-records-chain is
corrupted, but the error can be catched. If this now happens with the
assortment database, the database is deleted.
See also:
http://www.yacy-forum.de/viewtopic.php?p=24586#24586

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 13:45:23 +00:00
theli
d2e8e76218 *) now it's possible to configure the yacy blacklist separately for dht, search, proxy, crawler
See: http://www.yacy-forum.de/viewtopic.php?t=2541
        http://www.yacy-forum.de/viewtopic.php?p=24516

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2389 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 02:42:10 +00:00
orbiter
9ae9062bd3 * disabled new kelondroFlex table for NURLs
* added new RAM index Class
* fixed possible synchronization problem in kelondroRecords


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 00:58:43 +00:00
orbiter
689bbcf9cd replaced kelondroTree db for NURLs by new kelondroFlexTable
The new database is only created if the old is deleted or does not exist

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 23:36:58 +00:00
orbiter
7fbba41962 synchronization fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 23:04:36 +00:00
orbiter
328f9859a5 more synchronization in plasmaWordIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 22:07:59 +00:00
orbiter
f43c90fa98 fixed handling of null referer in crawlOrder
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 21:46:34 +00:00
orbiter
130e6d4719 generalized index object for eurl, nurl and lurl to prepare move
of these tables to new kelondroFlexTable Object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 17:37:54 +00:00
orbiter
acdf24877f more synchronization against outOfMemoryError in wordIndex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2381 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 16:27:56 +00:00
orbiter
95160d7f2c fixed size computation of index elements from the collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 16:01:18 +00:00
orbiter
26116cabde added missing rowdef assignment
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:31:40 +00:00
orbiter
cfbacbbf08 reverted change in robotsParser
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:29:29 +00:00
orbiter
abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
This method was implemented in de.anomic.net.URL


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
orbiter
740d49751d * strict type and size check in kelondroRow handling
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
  because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
  into a properties file along the database files.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 03:20:44 +00:00
orbiter
314021453f * more logging
* option in yacy.init to set useCollectionIndex usage

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-10 21:21:50 +00:00
allo
a52f36787f better templatedebugging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2371 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-10 14:02:03 +00:00
allo
3480d36417 added some debug code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2369 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-09 16:57:36 +00:00
orbiter
61b151b083 * added another auto-fix for collection index inconsitency check
* fixed words size computation for collection index


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-08 00:52:04 +00:00
orbiter
0bbbd129ef small fix for exception message
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2367 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 23:52:12 +00:00
orbiter
718fbc2dae enhancements in kelondroCollectionIndex:
* synchronized array and index objects
* auto-fix function for slightly corrupted index entries
* generalized internal access methods

also extended kelondroIndex interface to support ordering access
which is used in kelondroCollectionIndex for string comparisments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 23:29:26 +00:00
orbiter
f58283def2 better control of index flush
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2364 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 22:07:17 +00:00
orbiter
4be21a3cab ups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:56:02 +00:00
orbiter
80b6c90d54 enhancements to prevent blocking during dht transfer receive
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 21:49:39 +00:00
theli
9f298083cd *) adding more urls to the error url
- old error strings where replaced with there corresponding constants   
   See: http://www.yacy-forum.de/viewtopic.php?t=2638

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 15:11:14 +00:00
hermens
d56f06401e - Cache known URLs during indexReceive to avoid getting blocked during loadedURL.exists() whenever possible
- Small logging updates



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2359 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:42:00 +00:00
theli
c09f734d06 *) offer router configuration on ConfigBasic.html
- checkbox to allow router configuration is shown if
   - a) the UPnP forwarder is installed
   - b) a UPnP enabled router was found
   - c) no other forwarder was configured
   See: http://www.yacy-forum.de/viewtopic.php?p=24264

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:31:18 +00:00
hermens
dcbb4d0a6b Display the size of HashBlacklistedCache on PerformanceMemory page.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 11:19:54 +00:00
orbiter
d799622da1 better flush limit for index collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:44:43 +00:00
orbiter
d468d665c9 some changes that may help to prevent deadlocks that cause an OutOfMemoryError
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:19:01 +00:00
theli
d54767f634 *) last step of removing embedded html from dir class
- migration finished
*) dir list now sorts the dirlist entries. 
   - directories are listed before files
   - files are sorted alphabetically, case insensitive 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2351 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-06 14:38:07 +00:00
orbiter
279b1d969d Integrated new indexing data structure 'collections' into the main class
for indexing, the plasmaWordIndex.

The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.

The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.

Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 22:22:14 +00:00
orbiter
4ff742e42d implemented indexCollectionRI
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 19:18:33 +00:00
orbiter
01f95eccd3 re-write of kelondroCollectionIndex. This is the data structure that
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-04 23:04:03 +00:00
orbiter
ebc2233092 * implemented (finished) class indexRowSetContainer
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 23:20:03 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
0b7112f8b2 fix for missing topLevelClone in indexRAMCacheRI.wordContainerIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 00:43:03 +00:00
orbiter
e357599f92 * fixed problem with indexContainer iteration from RAM:
indexContainers from RAM must be cloned explicitely to prevent
  side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
  deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-01 10:30:55 +00:00
theli
57fe5cc671 *) code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2338 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-30 06:25:40 +00:00
allo
4e9f02c8ec integration of Michaels string-extraction.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 23:11:15 +00:00
orbiter
8b77afd72c some fixes to new container merger
and some code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 22:40:11 +00:00
orbiter
830167596a bugfix for
http://www.yacy-forum.de/viewtopic.php?p=24127#24127

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 18:16:33 +00:00
theli
839806a775 *) serverPortForwardingUpnp.java: code cleanup, license header added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 15:32:35 +00:00
theli
03230cd887 *) removing old port forwarding classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2330 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:42:12 +00:00
theli
6e676224d0 *) adding support for upnp
A new port forwarding method for upnp was added.
   If this method is enabled, yacy automatically determines an UPnP 
   capable internet gateway and configures the gateway port forwarding
   settings properly. 

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2328 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 14:26:45 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
theli
0db237467f *) bugfix for URL generation from file
see: http://www.yacy-forum.de/viewtopic.php?p=24116

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2326 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-25 16:18:45 +00:00
orbiter
ad692fc6c7 implemented option to extract nurls from the database
(plus some iteration enhancements for nurls)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:40:59 +00:00
orbiter
7fd90ca7c8 * strict handling of NURL entry element generation, storage and stacking
* more space for EURL reason strings (you must delete the EURL db to use this)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:04:14 +00:00
orbiter
5f72be2a95 some redesign of EURL storage
* store() is now called explicitely
* more urls are written to the EURL table
* the EURL stack does not store the complete entry any more, now only the URL hash


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 15:25:47 +00:00
orbiter
1ed3e2daef added option to extract domains and/or urls from the eurl database
when extracting from eurl, the html output format is recommended, since
this format adds also the fail reason to the domain/url.
The complete syntax for domain extraction is now
java -Xmx<megabytes>m -classpath classes yacy -domlist [ -source { lurl | eurl } ] [ -format { text  | zip | gzip | html } ] [ <path to DATA folder> ]


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 08:08:33 +00:00
orbiter
7e0a130fb5 new indexURLEntry class 'indexURLEntryNew', to replace old class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:43:36 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
orbiter
e20ff77c10 another bugfix in new url class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:37:22 +00:00
orbiter
685430a1b5 bugfix in new URL class, better loggin for domain extraction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 11:33:01 +00:00
orbiter
79af283f6c better debugging in new URL class for wrong port numbers
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-22 10:21:24 +00:00
allo
1b2ea58ee9 wrong substring invocation.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-20 13:49:38 +00:00
orbiter
e4f1820b58 protection against too long authentication strings in switchboard
see also: http://www.yacy-forum.de/viewtopic.php?p=23943#23943

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2312 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-20 11:30:10 +00:00
orbiter
b3f7e62e03 better handling of whitespace
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 23:53:27 +00:00
orbiter
4149939c02 better handling of whitespace for gettext quotation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 23:18:06 +00:00
orbiter
97fa6788a1 added gettext support:
automatic replacement of string appearances in html files by
gettext quotes.
see also: http://www.yacy-forum.de/viewtopic.php?p=23901#23901

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 22:35:36 +00:00
theli
b3c569f706 *) renaming of function getTransferedEntitySpeed to getTransferedEntrySpeed to avoid confusion
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 13:52:33 +00:00
orbiter
67edd80884 removed tabs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 11:13:14 +00:00
allo
67c486a023 some example Code, how supertemplates can be used.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2304 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 07:08:15 +00:00
orbiter
5214f571cd simplified method call in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-19 00:42:20 +00:00
allo
7b0e2521bb Support for a supertemplate, which can do all thing, a normal template can do.
Its a layer under the servlets, this means, #[page]# will be replaced by serverletcode, the rest can be set by you.
(TODO: if we use this for layout, we need to read "TITLE" from the servlet's tp, to set it outside of the servlet.)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2302 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 15:51:19 +00:00
orbiter
4bd626572b added hashCode and compareTo to new URL class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2301 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 12:00:54 +00:00
orbiter
abb5264929 fix for
http://www.yacy-forum.de/viewtopic.php?p=23868#23868

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2300 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 11:52:56 +00:00
theli
a70cbd959b *) further improvements for the anomic.net.url class
- relpath starting with javascript: are ignored now
   - bugfix for concatenation of relpath starting with # or ?
     in this case no slash should be added to the baseURL, otherwise
     we get URLs of the form http://test.de/index.html/?param=value

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2298 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 05:12:08 +00:00
theli
8a1f1d96b3 *) Bugfix for url concatenation. Relative urls with / or http:// at the beginning
were not handled correctly on url concatenation via new URL(URL,relPath).
   See: http://www.yacy-forum.de/viewtopic.php?t=2623

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-18 04:48:18 +00:00
rramthun
ca33eaa442 - Some spelling
- Removed unused init value
- Set default upload value to "none", which avoids an warning which says, upload method '' would be unknown, on new installations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-17 16:16:41 +00:00
allo
8795875800 dirlisting for all empty directories.
no problem to update dir.java anymore, because its only in htroot/htdocsdefault needed.
migration to delete old dir.* files in the fileshare

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-17 15:49:42 +00:00
orbiter
7935f27038 enhanced synchronization in balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2291 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:31:00 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
07900366ac deactivated cache-initialization for file-indexes (files in WORDS)
see also: http://www.yacy-forum.de/viewtopic.php?p=23801#23801

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2289 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-12 09:45:31 +00:00
orbiter
40aa735520 fixe timing problem causing too long delay during initialization of kelondroTree objects
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-11 23:44:44 +00:00
orbiter
d2bb3f442e fixed timing problem causing a division by zero exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-11 23:43:25 +00:00
allo
6acb6a4d8f tiny performance optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2285 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-09 15:37:45 +00:00
allo
2bdf1fc360 totalPPM
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2282 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-08 22:41:48 +00:00
theli
24a02cbeef *) Bugfix for not parsable application/xhtml+xml resources if
an URL has no extension
   See: http://www.yacy-forum.de/viewtopic.php?p=23687

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2280 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-07 05:36:19 +00:00
orbiter
b0ca5fa784 some correction algorithm for preload time computation during assortment open
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2279 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-05 09:20:59 +00:00
orbiter
e22cbaee97 - extended logging for preload
- reduced preload-time for IndexImport_p.java

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2278 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-05 09:02:58 +00:00
orbiter
671fd9a5c9 work towards new indexing database structure
(no effect on current functionality yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-04 14:47:27 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
ce9dd3e76d some work in the index construction zone (no effect yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 15:14:54 +00:00
theli
fe617d7e54 *) adding function to return the protocol type of a ssl connection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2274 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 14:16:46 +00:00
orbiter
018b3e0832 added pause option to server threads.
The pause is started by calling intermission(Long.MAX_VALUE)
and can be stopped by calling intermission(0)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2272 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 13:20:14 +00:00
orbiter
e1a52bea22 added a class stub for the new database structure:
a reverse word index based on a a collection index,
which is an index for a set of array files containing
row collections.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 22:24:13 +00:00
orbiter
3b69b35bf2 added pre-load of node cache entries to kelondroRecords
this gives the kelondroTree data structure a similar start-up
behaviour like the kelondroFlexTable: the cache is filled with
routing data in such a way that is more performant than
reading node records during normal operation.
The pre-load phase stops automatically after a time-out of 500 milliseconds
of if the cache is full.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2270 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 22:20:55 +00:00
orbiter
85d575e928 enhancements to kelondroRow and kelondroColumn
these are changes towards a better indexURLEntry implementation
which are needed for the new database structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 01:26:06 +00:00
orbiter
ab1ed053f5 another small correction
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 00:57:20 +00:00
orbiter
b92561fb67 removed unused code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2266 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 00:44:38 +00:00
orbiter
eadbd56fc5 small adjustment to last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 00:39:23 +00:00
orbiter
e9765ac4e6 introduced bulk read for node iterator in kelondroRecords
this speeds up the iterator by factor 2

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2264 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 00:31:20 +00:00
orbiter
6643da3fbd bugfix for http://www.yacy-forum.de/viewtopic.php?p=23463#23463
(affected URL DB Cleaner)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-01 09:51:00 +00:00
orbiter
866d53ed70 fix for DNS block bug
see http://www.yacy-forum.de/viewtopic.php?p=23458#23458

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-30 22:39:33 +00:00
orbiter
6af70febef - added kelondroTree index option to kelondroFlexTable
- automatic generation of index file when index is too large for RAM


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2261 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-30 12:54:19 +00:00
orbiter
dd2865178a major bugfix (searched a whole week for the bug) for
the kelondroRowBuffer, which has effect mostly to the
kelondroFlexTable but also to all other database functions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2260 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-30 08:52:39 +00:00
orbiter
f9b9d085c4 just changed testing code
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2259 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-29 23:01:42 +00:00
theli
b594ee9a5a *) Adding possibility to configure if the http proxy should send the
X-forwarded-for header (requested by TeeSee)
   See: http://www.yacy-forum.de/viewtopic.php?t=2577

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2257 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-29 16:01:03 +00:00
orbiter
ef84fc4956 added IOException to size() and row()
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2256 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-29 07:56:27 +00:00
orbiter
84dfd76a6a kelondroFlex bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2254 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-28 16:14:47 +00:00
hydrox
8ba8e2b7d9 *) added cache for blacklists urlhashs recieved by DHT. DHT does not request URLs listed in this cache.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2251 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-28 08:51:34 +00:00
hermens
53cbcc6d6e Implement emergency break in index receive when the limit of the ramCache is exceeded by more than cacheLimit
See: http://www.yacy-forum.de/viewtopic.php?p=22911#22911



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-27 11:14:30 +00:00
orbiter
e40987ecab removed default memory reservation for testing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2247 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-27 09:16:07 +00:00
orbiter
4cc6e6551f bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-26 09:52:23 +00:00
orbiter
66964dc015 removed high/med/low from kelondroRecords cache control.
this was done because testing showed that cache-delete operations
slowed down record access most, even more that actual IO operations.
Cache-delete operations appeared when entries were shifted from low-priority
positions to high-priority positions. During a fill of x entries to a database,
x/2 delete situation happen which caused two or more delete operations.
removing the cache control means that these delete operations are not
necessary any more, but it is more difficult to decide which cache elements
shall be removed in case that the cache is full. There is not yet a stable
solution for this case, but the advantage of a faster cache is more important
that the flush problem.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-25 10:31:38 +00:00
allo
6866bc2758 be quiet!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2243 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-24 17:40:55 +00:00
borg-0300
4c6083b264 network picture;
back to old version

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2242 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-24 02:52:24 +00:00
borg-0300
955915385a network picture;
small changes;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-23 15:37:59 +00:00
borg-0300
027fa8ab1c network picture;
bigger; 
more dot steps; 
small other;

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2240 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-23 13:53:29 +00:00
orbiter
41c4641612 added some profiling to kelondro caching classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2239 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-23 12:49:42 +00:00
orbiter
dd560e4b2f finetuning
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2238 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-22 23:11:40 +00:00
orbiter
5b1d77cd4b some enhancements to caching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-22 15:42:09 +00:00
theli
b20496e42b *) make DHT DoS check configurable (requested by KoH)
- check can be disabled via property indexDistribution.dhtReceiptLimitEnabled
   - upper bound can be configured via indexDistribution.dhtReceiptLimit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2234 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-21 19:28:42 +00:00
orbiter
650c7e9e55 some enhancements to caching
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2233 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-06-21 16:05:31 +00:00