Commit Graph

2096 Commits

Author SHA1 Message Date
karlchenofhell
88245e44d8 - improved version of robots.txt (delete your old htroot/robots.txt before updating):
- robots.txt is a servlet now
  - no need to rewrite the whole file each time a section is added or removed
  - user-defined disallows, added manually, won't be overwritten anymore
- new config-setting: httpd.robots.txt, holding names of the disallowed sections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 01:19:38 +00:00
karlchenofhell
9623bf7bbe - removed call of java 1.5 method
- added config servlet for local robots.txt
- removed YPStats_p as it is of no use anymore
- supertemplates use XHTML now
- quick-fix for http://www.yacy-forum.de/viewtopic.php?p=32296#32296

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3422 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-01 13:54:14 +00:00
orbiter
51e12049fa third generation of R/W head path optimization
- data from collection arrays are read in order
- merged data is written in order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-28 11:13:23 +00:00
karlchenofhell
a1d68fe092 - use .class rather than Class.forName for classes in class-path
- added Bost's patch for Diff.findDiagonale() from: http://www.yacy-forum.de//files/patch_685.txt
- fixed minor bugs in Blog

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3416 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 22:52:22 +00:00
orbiter
10a3c20b8d some more enhancements to R/W Head path optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 15:54:02 +00:00
orbiter
f4cfd19835 second Generation of collection R/W head path optimization:
- permanent cache flush is switched off. The optimized cache flush
  works better if it is a large number of collections that is flushed
  together
- the flush size can be configured instead the flush divisor. There is
  only one size for all flushes
- collection records that shall be removed during collection transition
  (jump from one collection file to another) are now not really removed
  but only marked in RAM. add-operations to the collection use these
  marked collection spaces
- index bulk write operations are now separated for each file of a kelondroFlex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 13:01:22 +00:00
orbiter
1fda50fd3c correct R/W head positioning in kelondroFlex
and some enhancements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 22:25:39 +00:00
orbiter
304412a049 first generation of collection index R/W head path optimization
- collections are now hand-over as collection lists to collection index for merge opertations
- collection index lists are separated into 'new' and 'extend' lists
- lists are written separately
- write operations are done into array sets and array indexes. These are now serialized
- write operations into index files are sorted by index;
  that means that a R/W head does not need to go forward
  and backward, only forward
More enhancements are possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 15:49:23 +00:00
hydrox
54fef3574f *) missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:38:34 +00:00
hydrox
cb89c74d52 *) added blog-comments
*) removed debug-output when deleting news

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:36:01 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
32867580ee update to kelondroRecords needed fo last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3403 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:55:36 +00:00
orbiter
e3480d4ad3 fix for warning in crawl balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:54:43 +00:00
orbiter
8668ac5d91 preparations for collection index cache flush optimization
(hand-over commit, no functional change to current code)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-25 21:06:26 +00:00
karlchenofhell
39a2000d8b - added support for [[Bookmark:$bookmarkTag|description]]-link-listings (requested by theli) to wiki-parser
- added support for <pre>-tags to wiki-parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3393 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 21:26:48 +00:00
karlchenofhell
619653c054 - fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 15:40:56 +00:00
karlchenofhell
26f5757b40 - added support for multiple paths per domain to default-blacklist
warning: an interface-change had been neccessary:
- remove(String, String) has been renamed to removeAll(String, String), because it removes all path-entries for the specified host
- remove(String, String, String) has been added to delete only a path-entry
- geBlacklistType(String) has been renamed to getBlacklistType(String)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 13:56:32 +00:00
karlchenofhell
a5a36d9252 - hopefully last fix fo 1.5 methods (sorry for that, eclipse isn't that helpful in identifying those methods)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-22 08:04:09 +00:00
karlchenofhell
e97b6f0458 - we still use Java 1.4 ...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:43:31 +00:00
karlchenofhell
0c7b8cf632 - added first version of new wiki-parser
- added blacklist support to manual URLFetcher stack fill
- fix for NPE: http://www.yacy-forum.de/viewtopic.php?t=3559

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 22:31:36 +00:00
orbiter
f7803a6ce4 enhanced crawl balancer
- new domains now get a chance to get crawled early
- less IO operations
- new balancing method
- better dump order at shutdown time
- bugfixes regarding not found url hashes (no more superfluous cache kill)
- domain access time is now shared over all balancer stacks
- viewing the stack does no more disturbish the balancing algorithm that much
- intelligent selection of best next domain using domain access times
- extra double-check (to double-check the double-check)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 16:23:31 +00:00
low012
801eea8849 *) Fixed bug where pairReplace() got caught in infinite recursion. (http://www.yacy-forum.de/viewtopic.php?t=3466)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3383 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 22:07:59 +00:00
orbiter
c3e8c23f5d fix for 'CANNOT FETCH ENTRY: hash is null' bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 13:53:21 +00:00
orbiter
badab8d924 fixed some more bugs in new db handling
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 12:29:12 +00:00
orbiter
e72d253577 fixed problem with initial cache load
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 11:20:48 +00:00
orbiter
2d8e472cfd emergeny bugfix for last commit
(kelondroTree should work again)
the cache prefill is broken and will be fixed later

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 10:25:17 +00:00
orbiter
dc0c06e43d PLEASE MAKE A BACK-UP OF YOUR COMPLETE DATA DIRECTORY BEFORE USING THIS
redesign for better IO performance
enhanced database seek-time by avoiding write operations at distant
positions of a database file. until now, a USEDC counter was written
at the head-section of a kelondroRecords database file (which is the
basic data structure of all kelondro database files) to store the
actual number of records that are contained in the database. Now, this
value is computed from the database file size. This is either done
only once at start-time, or continuously when run in asserts enabled.
The counter is then updated only in RAM, and written at close of the
file. If the close fails, the correct number can be computed from the
file size, and if this is not equal to the stored number it is a strong
evidence that YaCY was not shut down properly.
To preserve consistency, the complete storage-routine had to be re-written.
Another change enhances read of nodes in some cases, where the data-tail
can be read together with the data-head. This saves another IO lookup during
each DB node fetch.
Includes also many small bugfixes.
IF ANYTHING GOES WRONG, ALL YOUR DATA IS LOST: PLEASE MAKE A BACK-UP

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 08:35:51 +00:00
karlchenofhell
c016fcb10f - added streaming-support to CrawlURLFetchStack_p servlet
- bug for NPE in list.java
- use more constants

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3373 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-19 12:47:46 +00:00
karlchenofhell
d114a0136e - crawl profile: don't add null-values
- added some settings and statistics for url-fetcher 'server'-mode
- added own stack for fetchable URLs
- added possibility to fill stack via shift from peer's queues, via POST (addurls=$count and url$num=$url) or via file-upload
- added "htroot" to classpath of linux start-script

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-17 19:16:53 +00:00
karlchenofhell
b2a9d32f29 why do I always forget some lines? sorry...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-14 15:11:03 +00:00
theli
e1edb23689 *) Bugfix for IllegalMonitorStateException
See: http://www.yacy-forum.de/viewtopic.php?t=3522

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 19:32:49 +00:00
orbiter
bf69a721cb more protection against mis-use of YaCyHop interface:
- target must not be at port 80
- target access not more than every 3 seconds
- requester may not access more than every 10 seconds

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3357 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 15:25:10 +00:00
orbiter
a15963ff98 better balancing: if element from top would force a busy waiting,
an element from the bottom of the stack is used instead.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 10:32:58 +00:00
orbiter
dda24fcb85 ups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 09:55:21 +00:00
orbiter
8c1d2e0227 protection against crawl balancer failure:
a minimum of 500 milliseconds distance between two acesses
to the same domain is now ensured

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 09:48:23 +00:00
orbiter
1f1f398bfa enhanced speed of RAM cache flush by factor 20 (twenty times faster)
- the speed was doubled by avoiding read access during the dump
- the speed was dramatically increased at least by factor 10
   by using a temporary ram-file where the structures are flushed to
   before it is dumped then as a whole byte-chunk to the file system.
The speed enhancements also affects some other parts of the database.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-08 23:21:46 +00:00
orbiter
30d79d69a6 fix for wrong display of search statistics
see http://www.yacy-forum.de/viewtopic.php?p=31242#31242

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-08 10:42:35 +00:00
orbiter
daf2e15f59 some storage process enhancements (write without preceding read)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-07 23:23:24 +00:00
orbiter
9c2101a852 small enhancement to cache dump
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-07 00:02:54 +00:00
orbiter
c464157a6e replaced some toString()
see http://www.yacy-forum.de/viewtopic.php?p=31151#31151

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:26:56 +00:00
orbiter
7673f0869b minor enhancements
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 16:01:03 +00:00
orbiter
b4aa195c27 added user-agent check for yacy-hop proxy authentication
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 09:53:02 +00:00
orbiter
2d7f7da7ce fix for null pointer exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-06 09:50:24 +00:00
orbiter
d25caa07bf redesigned some parts of http authentication
added another access check for peer hops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 19:46:50 +00:00
hydrox
9184113284 *) fixed News deletion. News are now removed if they are no longer in a news-stack. This does not effect News-entires in the news-db that have no stack-entries.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 13:35:36 +00:00
(no author)
e218940293 The copyright sign "\u00A9" is already replaced by "&copy;". String "(C)" is not a unicode sequence!
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3334 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 18:16:27 +00:00
low012
1bc4d8d470 *) If there is more than one pair of patterns in a line, all of them (and not only one pair) will be replaced.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3333 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 15:53:40 +00:00
low012
ea7a8cf7aa *) <hr> and <br> tags are XHTML compliant now.
*) Avoid superflous trailing blank in non-proportional sections.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3332 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 15:03:13 +00:00
orbiter
d03cd41266 fix for http://www.yacy-forum.de/viewtopic.php?t=3411
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3331 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-04 04:39:47 +00:00
karlchenofhell
f2e6f19b90 - added versioning to Wiki
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-03 15:20:12 +00:00