Commit Graph

1250 Commits

Author SHA1 Message Date
orbiter
a738b57b31 added author tag to indexing content
enhanced composition of title tag
TODO: insert author information for external parsers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-17 01:18:34 +00:00
orbiter
6be57983a8 another update to the crawl balancer
can now alternate between top and bottom of the crawl stack

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 16:54:54 +00:00
orbiter
4783a30910 - fixed a flush problem in balancer
- return to idle divisor in RWI RAM cache flush

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3485 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 15:16:26 +00:00
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
orbiter
581db87237 more debug code for
http://www.yacy-forum.de/viewtopic.php?p=33009#33009

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 15:04:06 +00:00
orbiter
81c4cc6bf7 better debugging of balancer failure
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3478 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 12:02:56 +00:00
orbiter
96b79bf86d redesigned remove method in kelondroRowSet
This should fix also numerous bugs like
http://www.yacy-forum.de/viewtopic.php?p=31077#31077
(java.lang.ArrayIndexOutOfBoundsException in kelondroRowCollection.removeShift)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 08:55:05 +00:00
orbiter
9f929b5438 better snippet handling in case of snippet load fail
see also http://www.yacy-forum.de/viewtopic.php?p=31096#31096

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-13 22:18:36 +00:00
auron_x
d451ad48d3 *) improved peerloadgraphic:
- unnecessary (0 %) pieces are removed
 - percent-values of each thread displayed in legend

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3474 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 19:08:17 +00:00
orbiter
a5d668c0c6 added speed-buttons for easy performance setting
appears in crawl start and on indexing monitor page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3473 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 16:24:28 +00:00
orbiter
5b0a84ce09 fix for synchronization deadlock with flushMissNameCache.
see also: http://www.yacy-forum.de/viewtopic.php?p=32939#32939

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3472 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-12 09:06:57 +00:00
karlchenofhell
e2ac5f62bd - Code hübscher machen [von NNs TODO]
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3471 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-11 19:53:14 +00:00
allo
f04097c3dd integrated tor-patch for crawling, if yacyDebugMode is set.
(replaces: http://yacy.deruwe.de/overlay/net-misc/yacy-tor/files/disable_dns_checks-svn3132.patch)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3470 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-11 18:43:11 +00:00
auron_x
22fe14f292 *) first version of Peerload-graphic
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3469 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-11 17:04:11 +00:00
orbiter
432d7d4e9c better catch
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3468 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:38:08 +00:00
orbiter
8f7e8b6ee2 auto-delete for not-fixable db error in crawl stacker.
see also http://www.yacy-forum.de/viewtopic.php?p=32906#32906

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3467 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:31:36 +00:00
orbiter
7a52b07fcc better memory protection during freemen cycle
see also http://www.yacy-forum.de/viewtopic.php?p=32903#32903

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3466 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-10 23:22:37 +00:00
orbiter
6faa262259 fix for NURL-fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3465 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 14:30:53 +00:00
orbiter
243a2f831b fixed problem with not found NURL-hashes
The cause for this problem could still not be found, but the effect
is handled much better. The NURL-pop will continue automatically until
it found a hash that can be found.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3458 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 11:07:20 +00:00
orbiter
6ad39bae1e fixed shutdown problem
this fixes the 'inconsistency' messages during start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 08:48:47 +00:00
orbiter
38b93f8cb8 bugfix for my last commit:
iterator did not consider secondary start point in case of rotation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 22:07:17 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
33f97cff7a changed startup initialization sequence slightly
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 23:24:16 +00:00
orbiter
4e8eb1dbe3 some minor changes here and there
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 14:22:10 +00:00
karlchenofhell
03c5906ae7 - minor bugfixes for url-fetcher & http://www.yacy-forum.de/viewtopic.php?t=3646
- PerformanceMemory_p.html is valid XHTML again

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3440 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 11:50:03 +00:00
orbiter
313f6a7680 fix for http://www.yacy-forum.de/viewtopic.php?p=31553#31553
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3438 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:26:01 +00:00
orbiter
958ebea5c5 fix for http://www.yacy-forum.de/viewtopic.php?p=32470#32470
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3437 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:08:13 +00:00
orbiter
1cba31de43 redesigned ram organization for database caches
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 22:43:32 +00:00
orbiter
db235f2d61 added some memory protection in collection index multiple merge
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-04 22:54:04 +00:00
orbiter
b466baa574 added some memory protection
too large collection arrays are now avoided. By default, the biggest
collection index is 7. larger collections are dumped into a commons
directory, but cannot yet be used. Bevore doing a dump, the collection
is splittet into a part which has only root-references, and stored back
to the collection; the remaining part goes to commons

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 00:55:51 +00:00
low012
ce360ef43e *) no more HTML in plasmaCrawlProfile.java anymore
*) <br> will not be displayed in items in Auto Filter Content on WatchCrawler_p.html anymore
*) removed unnecessary replaceHTML()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3425 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 21:09:28 +00:00
karlchenofhell
88245e44d8 - improved version of robots.txt (delete your old htroot/robots.txt before updating):
- robots.txt is a servlet now
  - no need to rewrite the whole file each time a section is added or removed
  - user-defined disallows, added manually, won't be overwritten anymore
- new config-setting: httpd.robots.txt, holding names of the disallowed sections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3423 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-02 01:19:38 +00:00
orbiter
51e12049fa third generation of R/W head path optimization
- data from collection arrays are read in order
- merged data is written in order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-28 11:13:23 +00:00
orbiter
10a3c20b8d some more enhancements to R/W Head path optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 15:54:02 +00:00
orbiter
f4cfd19835 second Generation of collection R/W head path optimization:
- permanent cache flush is switched off. The optimized cache flush
  works better if it is a large number of collections that is flushed
  together
- the flush size can be configured instead the flush divisor. There is
  only one size for all flushes
- collection records that shall be removed during collection transition
  (jump from one collection file to another) are now not really removed
  but only marked in RAM. add-operations to the collection use these
  marked collection spaces
- index bulk write operations are now separated for each file of a kelondroFlex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 13:01:22 +00:00
orbiter
1fda50fd3c correct R/W head positioning in kelondroFlex
and some enhancements

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3409 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 22:25:39 +00:00
orbiter
304412a049 first generation of collection index R/W head path optimization
- collections are now hand-over as collection lists to collection index for merge opertations
- collection index lists are separated into 'new' and 'extend' lists
- lists are written separately
- write operations are done into array sets and array indexes. These are now serialized
- write operations into index files are sorted by index;
  that means that a R/W head does not need to go forward
  and backward, only forward
More enhancements are possible

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3407 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 15:49:23 +00:00
hydrox
cb89c74d52 *) added blog-comments
*) removed debug-output when deleting news

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3405 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 14:36:01 +00:00
karlchenofhell
6fbe31425a - some code-cleanup (no more syntax-warnings here)
- added deletion from loadedURLs of URLs to be blacklisted in IndexControl_p

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3404 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 12:56:50 +00:00
orbiter
e3480d4ad3 fix for warning in crawl balancer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-26 11:54:43 +00:00
karlchenofhell
619653c054 - fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3392 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 15:40:56 +00:00
karlchenofhell
26f5757b40 - added support for multiple paths per domain to default-blacklist
warning: an interface-change had been neccessary:
- remove(String, String) has been renamed to removeAll(String, String), because it removes all path-entries for the specified host
- remove(String, String, String) has been added to delete only a path-entry
- geBlacklistType(String) has been renamed to getBlacklistType(String)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3391 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-24 13:56:32 +00:00
orbiter
f7803a6ce4 enhanced crawl balancer
- new domains now get a chance to get crawled early
- less IO operations
- new balancing method
- better dump order at shutdown time
- bugfixes regarding not found url hashes (no more superfluous cache kill)
- domain access time is now shared over all balancer stacks
- viewing the stack does no more disturbish the balancing algorithm that much
- intelligent selection of best next domain using domain access times
- extra double-check (to double-check the double-check)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3384 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-21 16:23:31 +00:00
orbiter
c3e8c23f5d fix for 'CANNOT FETCH ENTRY: hash is null' bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 13:53:21 +00:00
orbiter
dc0c06e43d PLEASE MAKE A BACK-UP OF YOUR COMPLETE DATA DIRECTORY BEFORE USING THIS
redesign for better IO performance
enhanced database seek-time by avoiding write operations at distant
positions of a database file. until now, a USEDC counter was written
at the head-section of a kelondroRecords database file (which is the
basic data structure of all kelondro database files) to store the
actual number of records that are contained in the database. Now, this
value is computed from the database file size. This is either done
only once at start-time, or continuously when run in asserts enabled.
The counter is then updated only in RAM, and written at close of the
file. If the close fails, the correct number can be computed from the
file size, and if this is not equal to the stored number it is a strong
evidence that YaCY was not shut down properly.
To preserve consistency, the complete storage-routine had to be re-written.
Another change enhances read of nodes in some cases, where the data-tail
can be read together with the data-head. This saves another IO lookup during
each DB node fetch.
Includes also many small bugfixes.
IF ANYTHING GOES WRONG, ALL YOUR DATA IS LOST: PLEASE MAKE A BACK-UP

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-20 08:35:51 +00:00
karlchenofhell
d114a0136e - crawl profile: don't add null-values
- added some settings and statistics for url-fetcher 'server'-mode
- added own stack for fetchable URLs
- added possibility to fill stack via shift from peer's queues, via POST (addurls=$count and url$num=$url) or via file-upload
- added "htroot" to classpath of linux start-script

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-17 19:16:53 +00:00
theli
e1edb23689 *) Bugfix for IllegalMonitorStateException
See: http://www.yacy-forum.de/viewtopic.php?t=3522

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3358 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 19:32:49 +00:00
orbiter
a15963ff98 better balancing: if element from top would force a busy waiting,
an element from the bottom of the stack is used instead.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 10:32:58 +00:00
orbiter
dda24fcb85 ups
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3355 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 09:55:21 +00:00
orbiter
8c1d2e0227 protection against crawl balancer failure:
a minimum of 500 milliseconds distance between two acesses
to the same domain is now ensured

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-09 09:48:23 +00:00
orbiter
30d79d69a6 fix for wrong display of search statistics
see http://www.yacy-forum.de/viewtopic.php?p=31242#31242

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3352 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-08 10:42:35 +00:00
orbiter
daf2e15f59 some storage process enhancements (write without preceding read)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-07 23:23:24 +00:00
orbiter
d25caa07bf redesigned some parts of http authentication
added another access check for peer hops

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-05 19:46:50 +00:00
orbiter
b2f4087400 redesign of last-seen fieln inside seed:
the field contains now a time in UDC-0 (instead relative to local UDC offset)
this fixes a bug in peer selection, where an iteration over all seeds
ordered by lastseen did not work correctly.
Problems may occur because the new meaning of this field may mix with
the different meaning of that field in older peers

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-02 23:54:27 +00:00
orbiter
819ff21c92 fixed QPM output
QPM is temporarily called QPH (until more search requests are present?)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3313 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-01 00:17:35 +00:00
auron_x
89e7af037a *) used more switchboard-vars instead of config-vars
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3310 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 17:05:15 +00:00
orbiter
306c50ac40 QPM (queries per minute) statistic stub
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3308 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-31 15:39:11 +00:00
karlchenofhell
9f74b128dd - added many more commented constants (please use constants rather than i.e. config-setting strings directly)
- not all constants may be located correctly in the switchboard. Please relocate if you know the appropriate place for them

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3303 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-30 14:18:35 +00:00
orbiter
9c05e2a820 re-design ob kelondroMap
- this class is replaced by an object that can hold any type of object
- this object must be defined as a class that implements kelondroObjectsEntry
- the kelodroMap is now implemented as kelondroMapObjects

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3297 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 23:51:10 +00:00
orbiter
f25c0e98d1 - replaced String by StringBuffer in condenser
- added CamelCase parser in condenser
- added option to switch on or off indexing for proxy

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-29 01:11:22 +00:00
karlchenofhell
d311e258f8 - adjusted LogStatistics to nano-seconds
- removed patches of SVNs 3184/3185 preventing fast DHT

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3252 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 10:39:22 +00:00
orbiter
f3f99b19c6 extended search statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3249 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 01:45:29 +00:00
orbiter
c0851ee943 refactoring: moved and renamed de.anomic.data.searchResults to plasma package
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3248 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-19 00:38:03 +00:00
allo
c39dda2374 finished refactoring of searchtemplates.
now plasmaSwitchboard.searchFromLocal calculates a searchResults structure,
which is parsed in the yacysearch/detailedSearch Servlets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3244 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 10:42:36 +00:00
allo
35039982da refactoring of search process: store results in a searchResults structure. At the moment, its just stored in it, and read from it again.
Next step: return searchResults instead of serverObjects, and parse the results in the servlets.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3241 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 07:41:15 +00:00
orbiter
76fab83395 fixed bugs in seach statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3240 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-18 00:26:16 +00:00
orbiter
d07b132a0d - fixed colors of network grafic
- added option to activate write cache for seed-db
- did not activate write cache because it did not work

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3236 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-17 19:39:31 +00:00
allo
29aa7031d3 workaround for the snippets
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3225 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 21:35:25 +00:00
karlchenofhell
aea199cb7b - IndexTransfer is working again
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-16 17:18:25 +00:00
orbiter
5515571950 redesign of ymage classes
- less memory usage
- better usage of awt classes
- drawing abstractions: preparations for movable objects for animation class
- test applet for animations
- known bugs: wrong colours for network picture

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-15 23:31:50 +00:00
orbiter
52c6461e6b some bugfix for statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3211 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-15 16:03:00 +00:00
(no author)
fe72b772cf added a monitor page for search requests
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3206 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-15 01:50:57 +00:00
karlchenofhell
b873ad51ab - fix for http://www.yacy-forum.de/viewtopic.php?t=3369
- merged netBude's alternative for tables in yacysearch.html & search results valid
- added statistic info to index.html as proposed here: http://www.yacy-forum.de/viewtopic.php?p=29762#29762
- fixed error-log in httpTemplate

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3189 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-12 00:52:38 +00:00
borg-0300
1aa74bbd2b update for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3185 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-09 23:57:04 +00:00
borg-0300
23e613b2ab CPU & IO reduce (Index Distribution)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3184 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-09 17:23:29 +00:00
(no author)
c67d22116e added exists-check based only on RAM index lookup:
- faster double-check during crawling
- less IO

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3179 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-08 13:13:30 +00:00
(no author)
37e53b4a6a replaced tree database structure for seed db by flex data structure
I don't know if this helps, we will find out...

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-07 23:34:13 +00:00
karlchenofhell
35fb671721 - updated DetailedSearch and ViewFile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-06 12:21:13 +00:00
theli
d157201e08 *) IfesL for "Unexpected end of ZLIB" error message
See: http://www.yacy-forum.de/viewtopic.php?t=3327

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3169 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-05 13:45:31 +00:00
hydrox
2c01508ada *) fix for http://www.yacy-forum.de/viewtopic.php?p=29575#29575
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3162 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-04 11:36:38 +00:00
borg-0300
d2be3c674d wrong cache values fixed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3159 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-03 13:41:41 +00:00
karlchenofhell
df6281ba1f - removed JS from DetailedSearch => valid
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3151 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-02 21:43:21 +00:00
hydrox
fb1d8b91af *) changed Startpoints of IndexCleaner and IndexTransfer from ------------ to AAAAAAAAAAAA.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3150 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-01-01 12:27:05 +00:00
orbiter
9b726ac366 release 0.50
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3132 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-23 04:26:05 +00:00
orbiter
036a0c828e fix for auto-configuration of crawler thread memory
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3131 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-23 03:04:58 +00:00
orbiter
a4e90bc1dc fix + debug-code for http://www.yacy-forum.de/viewtopic.php?p=29126#29126
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3128 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-23 01:39:00 +00:00
borg-0300
6b5f28b746 answer for last commit: no
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3126 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-22 19:26:01 +00:00
borg-0300
d98ba7bc33 fix for memory limit computation ?
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3125 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-22 19:17:23 +00:00
orbiter
c48374d14a new memory limit computation for indexing queue
shall better prevent outofmemory errors

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3118 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-22 12:54:56 +00:00
orbiter
08ac4c5ed0 bugfix for http://www.yacy-forum.de/viewtopic.php?p=29045#29045
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3110 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-21 10:43:18 +00:00
orbiter
8e3bd17554 adopted DetailedSearch page to new ranking options
- fixed bug http://www.yacy-forum.de/viewtopic.php?t=3265
- more attributes on page
- attributes can be set as default for main search page
- option to re-set the attributes to built-in values

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3109 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-21 03:09:46 +00:00
orbiter
93a7e88245 more ranking parameter usage
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3108 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-20 23:51:55 +00:00
orbiter
2dbea612c9 fixed display bug for image search preview
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3107 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-20 23:31:57 +00:00
orbiter
0a050bc043 enhanced ranking
- redesign of data storage in plasmaSearchRankingProfile
- profiles are extended by new ranking parameters
- new RWI ranking parameters are considered during ranking
- appearance attributes (i.e. emphasised text) is now considered
- faster ranking
- some attributes that had been checked during post-ranking can now be
  checked during pre-ranking phase
- removed old ranking parameter on index.html page (will be replaced by profiles in the future)
- ranking can now consider appearances of media content
- snippet-loading for media types now work correctly (fetches only from the wanted media)
- ranking-profiles can be handed over the remote peers and apply there also
- re-search of same query with different domain now also re-triggers remote search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-20 15:44:29 +00:00
orbiter
61798f0ae6 added option to distinguish between text crawl and media crawl
- for each crawl start, there is now a flag for text and media
- the localCrawl flag is superfluous
- added new crawl profiles
- if an image search is done, only media links are crawled for the snippets


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3100 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-19 03:10:46 +00:00
orbiter
febe6b114a design update of crawler monitor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3094 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-18 01:18:28 +00:00
allo
782db9099d version independent name for commons-pool lib
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3082 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-15 07:12:33 +00:00
orbiter
7ff86d6ba6 - image search now shows thumbnails (in bad order, but it works)
- repaired DHT selection

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3081 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-14 02:48:37 +00:00
orbiter
ee3d91cb6b print-out of links that result from contraint-filtering
in search result

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3078 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-13 01:39:34 +00:00
orbiter
e4570bffaf -implemented a specialized snippet-fetch for media content
-changed search result preparation for media search presentation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3073 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-12 02:09:25 +00:00
low012
694a6e4f44 *) better text snipptes: any possible searchword (welt, linux, tag) in welt-linux-tag will be marked correctly now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3072 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-11 15:19:35 +00:00
orbiter
bddc197453 reverted by-mistake removed change from low012/SVN 3068
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3070 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-11 11:07:36 +00:00
orbiter
1377c53aa3 extraction of media links from search results
these links are mixed to the snippets for testing purpose
(a final version will handle this differently)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3069 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-11 01:31:23 +00:00
low012
586add4c6c *) Better snippets: words like GNU/Linux will not prevent Linux or GNU from being marked if they are searchword (see http://www.yacy-forum.de/viewtopic.php?t=2891)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3068 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-10 23:13:53 +00:00
borg-0300
8b7c543885 NullPointer fix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3061 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-09 15:51:34 +00:00
orbiter
937ccd4e76 fix for snippet-generation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3060 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-09 02:13:43 +00:00
auron_x
c086c71f17 *) fixed ArrayIndexOutOfBoundsException
--> http://www.yacy-forum.de/viewtopic.php?t=3210

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3058 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-08 14:56:18 +00:00
orbiter
c93cfdc23a fix for http://www.yacy-forum.de/viewtopic.php?p=28564#28564
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3057 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-08 13:03:03 +00:00
orbiter
93a5ace330 fix for http://www.yacy-forum.de/viewtopic.php?p=28544#28544
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3056 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-08 12:57:17 +00:00
orbiter
bf0d820659 - added correct flagging of word properties
- added self-healing to database in case that wrong free-pointers exist
- added presentation of media links in snippets (does not yet work correctly)
- code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-08 02:14:56 +00:00
orbiter
10d888e70c - added a media search for images, audio, video and applications
- new search options on search page
- new option in ViewInfo to display all links of a file
- enhanced collection data structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3054 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-07 02:40:57 +00:00
orbiter
a603c4d5e8 more code simplifications
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3052 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-06 13:13:55 +00:00
orbiter
9a85f5abc3 cleanup
- removed 'deleteComplete' flag; this was used especially for WORDS indexes
- shifted methods from plasmaSwitchboard to plasmaWordIndex

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3051 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-06 12:51:46 +00:00
borg-0300
fbe1ee402b plasmaCrawlLURL$kiter cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3050 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-06 12:49:13 +00:00
orbiter
773ba1e91a - generalized object order handling
- controlled object order for all database tables
- migrated DHT position computation to correct base64-decoded values
  this also closed the 'gaps' in the dht positions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3049 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-06 03:02:57 +00:00
borg-0300
15381cbf73 other bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3048 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 23:45:51 +00:00
borg-0300
ad65cc9d2f NullPointer fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3047 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 18:43:05 +00:00
borg-0300
d33745a7ea NullPointer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3046 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 17:48:01 +00:00
orbiter
3a4933b63c bugfix for
http://www.yacy-forum.de/viewtopic.php?p=28493#28493

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3045 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 12:32:19 +00:00
orbiter
109ed0a0bb - cleaned up code; removed methods to write the old data structures
- added an assortment importer. the old database structures can
  be imported with
  java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
  to the collection database. The call is
  java -classpath classes yacy -migratewords
  (as it was)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 02:47:51 +00:00
orbiter
052f28312a removed assortments from indexing data structures
removed options to switch on assortments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3041 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-02 19:34:59 +00:00
orbiter
2372b4fe0c release 0.49
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3040 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-02 01:57:49 +00:00
orbiter
f8efb3c948 fixed a null pointer exception problem reported in the forum.
I cant find the forum entry any more because my girlfriend switched
off the power while the forum window was open.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-01 22:36:32 +00:00
orbiter
ad1e4aa88e added selection of audio, video, image and application resources
to search procedure. This function can currently not used through the
search interface, but only through remote search.

added accumulation of search attributes to enable the audio, video,
image and application selection.

fixed a problem with external URL representation generation


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3036 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-01 16:21:17 +00:00
orbiter
7cc4cec9c9 bugfix for assertion bugs documented in
http://www.yacy-forum.de/viewtopic.php?p=28261#28261

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-01 01:30:05 +00:00
orbiter
7dbcd358b4 fix for http://www.yacy-forum.de/viewtopic.php?p=28231#28231
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3021 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-29 23:09:56 +00:00
orbiter
86394e7a56 fix for cache-delete problem:
- better synchronization
- files are only deleted if they have been in the cache for 5 minutes
- hash-path for the HTCACHE is now default

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3018 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-29 00:34:25 +00:00
orbiter
ceb9e3aa17 - enhanced parser: collection of audio, video, image and application links
- enhanced condenser: better handling of utf-8 and pre-formatted texts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3017 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-28 15:00:15 +00:00
orbiter
0b9370a9dc fix for http://www.yacy-forum.de/viewtopic.php?p=28108#28108
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-25 23:33:28 +00:00
orbiter
b5a29e9651 - fix for snippets that are too short
- added keyword to snippet fetch to suppres removal of not-found snippet words (for debugging)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3009 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-25 00:38:09 +00:00
orbiter
f1528672b1 filtering of non-index pages during index-of search
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3004 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-24 02:46:02 +00:00
orbiter
8e7215475b - extended ViewFile to use is as debugging-tool: you can now use the
post-parameter url to submit an url directly
- fixed some bugs in text parser (not all parts had been analysed)
- fixed a bug in remote search interface (could not handle constraints)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3001 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-23 15:47:19 +00:00
orbiter
30888e7a2f implementation of search constraints
Such constraints may formulate specific restrictions to web searches
This is implemented by scraping information for constraints from a web
page during parsing, and storing flags to the pages within the web index.

In this first step, only information for index pages ("index of", directory listings)
are scraped and stored in flags
- added new flag class kelondroBitfield
- added scraper method in condenser
- added bitfield structure for all scrape types (see also condenser)
- added bitfield structure for appearance locations (see RWIEntry)
- added handover protocol for remote search and index distribution
- extended kelondroColumn class to hold bitfield types
- added another search attribute on search page (index.html)
- extended search-filter to enable filtering of non-matching constraints
- set all new database types to be default
- refactoring: moved word hash generation to condenser class

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2999 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-23 02:16:30 +00:00
orbiter
49a83f99d9 - fix for wrong DHT ordering in DHT selection
- fix for http://www.yacy-forum.de/viewtopic.php?t=3112&highlight=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-21 00:36:41 +00:00
orbiter
f4b547dc13 limited index transfer to peer with version 0.486
this protects peers with version below 0.486 from new RWI objects
(which they cannot handle)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-20 02:46:53 +00:00
orbiter
10a4ab5195 disabled some (more) write caches
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-20 00:27:02 +00:00
orbiter
09bcc10344 bugfix for some problems of last change with assortments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-19 23:10:58 +00:00
orbiter
e3d75f42bd final version of collection entry type definition
- the test phase of the new collection data structure is finished
- test data that had been generated is void. There will be no migration
- the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION
- the index dump is void. There will be no migration
- the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-19 20:05:25 +00:00
orbiter
c9364246cc introduced new RWI-Object.
This will be used for the final version of the collections.
The new object is not yet used.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-17 14:17:20 +00:00
orbiter
e628d34e16 patches for bad data
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2951 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-11 14:35:36 +00:00
orbiter
497428c8ec refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2949 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-10 01:13:33 +00:00
orbiter
76fceb9997 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2945 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-09 16:32:34 +00:00
orbiter
eeda881553 bugfix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2938 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-08 16:38:19 +00:00
orbiter
bb7d4b5d5e refactoring to prepare new RWI entry object
- moved all url and index(RWI) entries to index package
- better naming to distinguish RWI entries and URL entries


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-08 16:17:47 +00:00
orbiter
bdc9216366 - more asserts
- some bugfixes
- some patches for bugs that are already in the database

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-08 02:08:33 +00:00
orbiter
1751a799ac - deactivated all write buffers
- fixed a storage bug


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-07 10:56:36 +00:00
orbiter
ba967c4875 - bugfixes and debug code
- ne generalized index class indexCachedRI

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-07 01:09:02 +00:00
orbiter
ee4715a21c - more asserts
- bugfix for performaceMemory
- refactoring of index ram cache: renamed indexRAMCacheRI to indexRAMRI, to make space for a cached indexRI, which should be named indexRAMCacheRI

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-06 10:15:05 +00:00
orbiter
114a76a86e - added flag to urlhash that shows that domain is a local domain
- enhanced local domain detection
- bugfixing for memory assignment in kelondroFlexSplit
- automatic memory assignment to caches according to available RAM
- bugfixes for details during search process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-06 02:05:39 +00:00
orbiter
b2d51be33c bugfix for latest changes to entry generalization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 19:07:19 +00:00
hermens
8385557672 Small fix for the Cache Monitor when using proxyCacheLayout=hash
see: http://www.yacy-forum.de/viewtopic.php?p=27394#27394



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2916 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 14:35:35 +00:00
orbiter
f1ed55a5fc bugfix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2913 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 02:23:32 +00:00
orbiter
8fdefd5c68 generalization of payload definition of index storage
this is one step forward to the migration to a new collection data format

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 02:10:40 +00:00
theli
ad248d61ca *) more verbose exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-03 14:37:12 +00:00
hydrox
7e8669b15c *) added possibility to "recycle" a DHTChunk that failed to transfer.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2898 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-02 21:32:59 +00:00
low012
4feaa91890 *) Added additional MIME-Type.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2895 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-02 13:32:04 +00:00
low012
89af433879 *) Deleted parts of WebCat that were not needed for parsing SWFs.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-02 11:13:33 +00:00
orbiter
46a712e195 - more asserts
- simplified indexURLEntry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2891 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-01 14:00:15 +00:00
low012
8c9bc7e341 *) extracting urls works now
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2890 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-01 09:22:15 +00:00
low012
493391e42d *) new flash parser, still experimental
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2888 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-01 00:52:42 +00:00
orbiter
215c4e65f1 code cleanup
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2887 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-31 22:10:25 +00:00
orbiter
bd4f43cd66 - fixed a null pointer exception bug
- switched off more write caches
- re-enabled index-abstracts search

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2885 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-31 02:45:41 +00:00
auron_x
194d42b6a7 *) changed PPM-calculation to be more accurate
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2884 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-30 19:04:06 +00:00
orbiter
fe8afaf426 switched off usage of write cache for imprortant databases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2883 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-30 02:59:22 +00:00
orbiter
d3431433b0 more anonymization in logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2876 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 22:28:33 +00:00
orbiter
e6044e5198 bugfix for
http://www.yacy-forum.de/viewtopic.php?p=27207#27207
and
http://www.yacy-forum.de/viewtopic.php?p=27219#27219

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2875 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 21:43:12 +00:00
orbiter
78b7f6f7fd bugfix for index remove bug,
appeared after search where snippet-loading triggered word removal

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 00:22:10 +00:00
orbiter
147d88cf23 re-design of database caching
this should reduce IO a lot, because write caches are now actived for all databases
- added new caching class that combines a read- and write-cache.
- removed old read and write cache classes
- removed superfluous RAM index (can be replaced by kelonodroRowSet)
- addoped all current classes that used the old caching methods
- more asserts, more bugfixes


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-26 13:50:50 +00:00
orbiter
4e363108e1 - removed bad debug code that caused a large and unnecessary delay during global search
- fixed problem that global search results disappear after a search
- removed some stopwords

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2861 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-25 02:24:41 +00:00
orbiter
2a9d868f6d - removed object cache from kelondroTree
- generalized object caching and added new object caching class
- added object caching wherever kelondroTree was used
- added object caching also to usage of kelondroFlex
- added object buffering (a write cache) to NURLs
- added many assert statements; fixed bugs here and there
- added missing close methods to latest added classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2858 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-24 13:48:16 +00:00
orbiter
3ffc5b8793 fixed problem with serverCharBuffer.append(char)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2821 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 21:44:02 +00:00
orbiter
06854988da - full integration of new LURL database in INDEX
- added migration method for urlHash.db into INDEX

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2819 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 21:14:37 +00:00
octoate
e4a3574b77 StringBuffer now resets every time the parser is called
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2817 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 16:58:45 +00:00
karlchenofhell
ce237aefad - assortment-sizes table from PerformanceQueues_p.html is not shown if not used
- escape query- and fragment-part of an url as well
- new resolveBackpath for urls: http://www.yacy-forum.de/viewtopic.php?t=2679#24867

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2815 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 15:27:24 +00:00
theli
a5b9b514c1 *) retry crawling without content-encoding if the content-encoding header was not correct
See: http://www.yacy-forum.de/viewtopic.php?p=26917#26917

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2811 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 08:45:52 +00:00
theli
92f774edd1 *) Better charset encoding detection
*) New testclass for charset encoding detection tests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2808 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-19 07:02:18 +00:00
orbiter
b79e06615d - added new LURL.Entry class for next database migration
- refactoring of affected classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2802 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 22:25:07 +00:00
octoate
cc24dde5e0 First version of a MS Excel parser based on Apache POI
(event based parsing)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2801 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 19:13:37 +00:00
karlchenofhell
4c63129136 - stupid mistake...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 15:14:38 +00:00
karlchenofhell
ebf0da2a45 - now the fix http://www.yacy-forum.de/viewtopic.php?t=2974 works
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2796 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 12:07:17 +00:00
theli
3d152bfe43 *) Logging message added
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2794 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-18 04:23:00 +00:00
karlchenofhell
b5e40e2fa2 - fix for http://www.yacy-forum.de/viewtopic.php?t=2974 (no cache-sizes for new db)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2792 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-17 21:01:35 +00:00
orbiter
77a59a115d refactoring of indexing methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2787 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-16 15:04:16 +00:00
theli
cbb1e710b9 *) removing old class
- was replaced by plasma/urlPattern/defaultURLPattern   

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2765 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 13:03:32 +00:00
orbiter
c6d46f7ebd null pointer bugfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2761 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 08:03:11 +00:00
theli
decb09df6d *) Trying to be more tolerant against wrong charset names
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 05:30:20 +00:00
theli
e9afe39cbb *) Trying to be more tolerant against wrong charset names
See: http://www.yacy-forum.de/viewtopic.php?p=26662

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 05:08:56 +00:00
theli
7526c831a8 *) Suppressing stracktrace
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2758 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 04:34:49 +00:00
orbiter
50f2578c55 - some bugfixing and code cleanup
- now assortments can completely left out if they do not exist
  before startup and collection index is selected.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2757 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-13 01:19:26 +00:00
orbiter
bdf4c7c51e added missing files for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2756 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 23:17:16 +00:00
orbiter
a5dd0d41af - refactoring of plasmaCrawlLURL.Entry to prepare new Entry format
- added test migration method to migrate the old LURL to a new LURL
the new LURL will be splitted into different tables for each month
this solves several problems:
- the biggest table in YaCy is splitted in different parts and can
  also be managed in filesystems that are limited to 2GB
- the oldest entries can easily be identified, used for re-crawl und
  deleted
- The complete database can be limited to a specific size (as wanted many times)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 23:14:41 +00:00
octoate
1c4076da8a First version of the MS Powerpoint parser based on Apache POI
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2753 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 17:28:53 +00:00
theli
5b75d64d7d *) bugfix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2750 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 09:39:25 +00:00
theli
71ed104bc7 *) adding additional rpm mimetype (used by packman)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2749 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-12 09:32:24 +00:00
orbiter
6396f5971e bugfixes and migration attempt toward new kelondroFlex db
- more synchronization
- bugfix for remove in collections
- bugfix in kelondroFlex (wrong exception condition!)
- options to use RAM, FLEX and TREE tables for Crawl URL stacker
- default for Crawl URL stacker is now FLEX (!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2746 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-11 00:46:45 +00:00
hermens
48f81acc0e reverse SVN 2744, it is not needed
(this resulted from a small misunderstanding of the newest cache layout)



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2745 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 22:02:23 +00:00
hermens
1da9aece12 Repair DNS prefetch during cacheScan
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2744 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 21:34:27 +00:00
theli
22649408ad *) Better errorhandling for charset encoding problem during content parsing
See: http://www.yacy-forum.de/viewtopic.php?t=2952

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2737 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 10:14:03 +00:00
theli
a9c7e3f061 *) Bugfix for NoSuchElementException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-10 08:39:27 +00:00
orbiter
c8f3a7d363 added snippet-url re-indexing
- snippets will generate an entry in responseHeader.db
- there is now another default profile for snippet loading
- pages from snippet-loading will be indexed, indexing depth = 0
- better organization of default profiles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-09 23:07:10 +00:00