Commit Graph

2125 Commits

Author SHA1 Message Date
orbiter
432d7d4e9c better catch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:38:08 +00:00
orbiter
8f7e8b6ee2 auto-delete for not-fixable db error in crawl stacker.
see also http://www.yacy-forum.de/viewtopic.php?p=32906#32906

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:31:36 +00:00
orbiter
7a52b07fcc better memory protection during freemen cycle
see also http://www.yacy-forum.de/viewtopic.php?p=32903#32903

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:22:37 +00:00
orbiter
6faa262259 fix for NURL-fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 14:30:53 +00:00
orbiter
909d7a8ae9 fixed wrong implemented row iterator in kelomdroFlexSplitTables
this has no effect, until now this iterator was only used on
the Index Administration page.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3464 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 13:55:26 +00:00
orbiter
a1fb8358b2 lets make a well-formed http link so that other crawlers don't have a problem to follow this link :-)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:35:54 +00:00
orbiter
4edb70f68b added yacybot info-page from Roland
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3462 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:26:31 +00:00
orbiter
3ef77d2030 fix for http://www.yacy-forum.de/viewtopic.php?p=29878#29878
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:14:25 +00:00
orbiter
3bb3df3fc0 fix for http://www.yacy-forum.de/viewtopic.php?p=32298#32298
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3460 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:03:53 +00:00
orbiter
243a2f831b fixed problem with not found NURL-hashes
The cause for this problem could still not be found, but the effect
is handled much better. The NURL-pop will continue automatically until
it found a hash that can be found.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 11:07:20 +00:00
orbiter
6ad39bae1e fixed shutdown problem
this fixes the 'inconsistency' messages during start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 08:48:47 +00:00
orbiter
38b93f8cb8 bugfix for my last commit:
iterator did not consider secondary start point in case of rotation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 22:07:17 +00:00
karlchenofhell
264a82eec8 - fix for http://www.yacy-forum.de/viewtopic.php?t=3657
- fix for http://www.yacy-forum.de/viewtopic.php?p=32758#32758
- Diff takes any objects now, not only strings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3455 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 22:04:15 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
23338d2070 small fix for RAM computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3447 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 23:55:52 +00:00
orbiter
33f97cff7a changed startup initialization sequence slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 23:24:16 +00:00
orbiter
4e8eb1dbe3 some minor changes here and there
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 14:22:10 +00:00
karlchenofhell
03c5906ae7 - minor bugfixes for url-fetcher & http://www.yacy-forum.de/viewtopic.php?t=3646
- PerformanceMemory_p.html is valid XHTML again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 11:50:03 +00:00
orbiter
3499a364ef a little bit better memory protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:38:14 +00:00
orbiter
313f6a7680 fix for http://www.yacy-forum.de/viewtopic.php?p=31553#31553
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:26:01 +00:00
orbiter
958ebea5c5 fix for http://www.yacy-forum.de/viewtopic.php?p=32470#32470
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:08:13 +00:00
orbiter
5d5e6ebfcc fix for http://www.yacy-forum.de/viewtopic.php?p=32631#32631
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3436 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 08:54:07 +00:00
orbiter
1cba31de43 redesigned ram organization for database caches
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 22:43:32 +00:00
theli
26450a1d9a *) avoid nullpointerException on seed.getAddress() (reported by netbude)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 16:11:36 +00:00
orbiter
db235f2d61 added some memory protection in collection index multiple merge
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-04 22:54:04 +00:00
theli
c72605ecab *) adding a function to determine if a given URL is bookmarkt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3428 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 11:57:49 +00:00
theli
bd03c6b874 *) bugfix in bookmarksDB:
- NullpointerException when trying to get an unknown bookmark
   - bookmarks can either start with http or https

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3427 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 11:56:46 +00:00
orbiter
b466baa574 added some memory protection
too large collection arrays are now avoided. By default, the biggest
collection index is 7. larger collections are dumped into a commons
directory, but cannot yet be used. Bevore doing a dump, the collection
is splittet into a part which has only root-references, and stored back
to the collection; the remaining part goes to commons

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 00:55:51 +00:00
low012
ce360ef43e *) no more HTML in plasmaCrawlProfile.java anymore
*) <br> will not be displayed in items in Auto Filter Content on WatchCrawler_p.html anymore
*) removed unnecessary replaceHTML()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 21:09:28 +00:00
karlchenofhell
88245e44d8 - improved version of robots.txt (delete your old htroot/robots.txt before updating):
- robots.txt is a servlet now
  - no need to rewrite the whole file each time a section is added or removed
  - user-defined disallows, added manually, won't be overwritten anymore
- new config-setting: httpd.robots.txt, holding names of the disallowed sections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 01:19:38 +00:00
karlchenofhell
9623bf7bbe - removed call of java 1.5 method
- added config servlet for local robots.txt
- removed YPStats_p as it is of no use anymore
- supertemplates use XHTML now
- quick-fix for http://www.yacy-forum.de/viewtopic.php?p=32296#32296

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3422 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-01 13:54:14 +00:00
orbiter
51e12049fa third generation of R/W head path optimization
- data from collection arrays are read in order
- merged data is written in order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-28 11:13:23 +00:00
karlchenofhell
a1d68fe092 - use .class rather than Class.forName for classes in class-path
- added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt
- fixed minor bugs in Blog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 22:52:22 +00:00
orbiter
10a3c20b8d some more enhancements to R/W Head path optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 15:54:02 +00:00
orbiter
f4cfd19835 second Generation of collection R/W head path optimization:
- permanent cache flush is switched off. The optimized cache flush
  works better if it is a large number of collections that is flushed
  together
- the flush size can be configured instead the flush divisor. There is
  only one size for all flushes
- collection records that shall be removed during collection transition
  (jump from one collection file to another) are now not really removed
  but only marked in RAM. add-operations to the collection use these
  marked collection spaces
- index bulk write operations are now separated for each file of a kelondroFlex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 13:01:22 +00:00
orbiter
1fda50fd3c correct R/W head positioning in kelondroFlex
and some enhancements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 22:25:39 +00:00
orbiter
304412a049 first generation of collection index R/W head path optimization
- collections are now hand-over as collection lists to collection index for merge opertations
- collection index lists are separated into 'new' and 'extend' lists
- lists are written separately
- write operations are done into array sets and array indexes. These are now serialized
- write operations into index files are sorted by index;
  that means that a R/W head does not need to go forward
  and backward, only forward
More enhancements are possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 15:49:23 +00:00
hydrox
54fef3574f *) missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:38:34 +00:00
hydrox
cb89c74d52 *) added blog-comments
*) removed debug-output when deleting news

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:36:01 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
32867580ee update to kelondroRecords needed fo last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:55:36 +00:00
orbiter
e3480d4ad3 fix for warning in crawl balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:54:43 +00:00
orbiter
8668ac5d91 preparations for collection index cache flush optimization
(hand-over commit, no functional change to current code)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-25 21:06:26 +00:00
karlchenofhell
39a2000d8b - added support for [[Bookmark:$bookmarkTag|description]]-link-listings (requested by theli) to wiki-parser
- added support for <pre>-tags to wiki-parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 21:26:48 +00:00
karlchenofhell
619653c054 - fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 15:40:56 +00:00
karlchenofhell
26f5757b40 - added support for multiple paths per domain to default-blacklist
warning: an interface-change had been neccessary:
- remove(String, String) has been renamed to removeAll(String, String), because it removes all path-entries for the specified host
- remove(String, String, String) has been added to delete only a path-entry
- geBlacklistType(String) has been renamed to getBlacklistType(String)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 13:56:32 +00:00
karlchenofhell
a5a36d9252 - hopefully last fix fo 1.5 methods (sorry for that, eclipse isn't that helpful in identifying those methods)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-22 08:04:09 +00:00
karlchenofhell
e97b6f0458 - we still use Java 1.4 ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:43:31 +00:00
karlchenofhell
0c7b8cf632 - added first version of new wiki-parser
- added blacklist support to manual URLFetcher stack fill
- fix for NPE: http://www.yacy-forum.de/viewtopic.php?t=3559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:31:36 +00:00
orbiter
f7803a6ce4 enhanced crawl balancer
- new domains now get a chance to get crawled early
- less IO operations
- new balancing method
- better dump order at shutdown time
- bugfixes regarding not found url hashes (no more superfluous cache kill)
- domain access time is now shared over all balancer stacks
- viewing the stack does no more disturbish the balancing algorithm that much
- intelligent selection of best next domain using domain access times
- extra double-check (to double-check the double-check)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 16:23:31 +00:00