Commit Graph

838 Commits

Author SHA1 Message Date
orbiter
513179f404 changed interface to colletctionIndex and adopted all implementing classes:
do not return a result of a double-check when adding entries with addUnique

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 23:55:08 +00:00
orbiter
9d64693cfb reverting again the changes to new concurrent chunkIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5362 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 22:22:44 +00:00
orbiter
45ad1c3dd5 - re-activated concurrent iterator for EcoFiles
- added javadoc for new concurrent intialization in kelondroBytesLongMap
- switched default value for commons storage to false
- version step

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5361 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 18:25:40 +00:00
orbiter
2e2120046f speed enhancement for BLOBHeap opening process
using concurrency of FileIO and content processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5360 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 17:38:01 +00:00
orbiter
10f5ec1040 reverted last commit (more testing needed)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5356 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-22 00:12:50 +00:00
orbiter
b0f2003792 fast database initialization and fast start.up of yacy:
- applied knowledge about concurrent files stream reading and index processing from the wikimedia reader
   to the EcoTable initialization process: the file reader is now concurrent to the index generation
- changed also some initialization processes to avoid some pauses during initialization

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-21 23:21:33 +00:00
orbiter
ef66438662 - more space in error db to store larger error messages
- added hash to HTCACHE storage files which will make it possible to join separate caches by just copying files

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5329 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-11 21:42:12 +00:00
orbiter
d014b2728a Design-check, Extension and Refactoring of DHT target position computation:
- two different computations (but mathematical equivalent) of the DHT distance had been consolidated
- moved from 0.0 .. 1.0 double-range position computation to 0 .. Long.Max range for DHT targets
- added fast Long - to - hash computation
- high-precision target computation of gaps for new peers
- added new target computation for horizontal and vertical DHT targets (not yet in use)
- old horizontal-only DHT targets will be upwards compatible to new horizontal and vertical DHT positions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5318 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-03 00:27:23 +00:00
orbiter
dd27ce7216 added control logic to ECO tables that deletes ram copies of the tables if they get too large
table copies in ram are now abandoned if less than 20 MB ram is left

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:53:09 +00:00
orbiter
38e6ba5d00 forgot to re-rename commonsPath
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5316 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:39:02 +00:00
orbiter
22989d0d8a added property index.storeCommons to switch commons storage on or off
with index.storeCommons=false all currently stored commons are deleted!
Default is now 'true', but in future full releases it will be switched to 'false'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5315 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-02 23:30:09 +00:00
danielr
103ad2a437 some javadoc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5299 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-24 13:58:26 +00:00
orbiter
6941bf42b1 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-20 14:07:09 +00:00
orbiter
9b0c4b1063 redesign of parts of the new BLOB buffer
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-19 22:30:44 +00:00
orbiter
1778fb420d - added some performance tweaks to the new BLOB buffer
- removed the now superfluous HT storage thread
- reduced number of file decompression by shifting the compression moment to the future


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5286 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-19 18:10:42 +00:00
orbiter
9663e61449 added another class to handle BLOB writings to the new HTCACHE data storage:
- entries are buffered and written as stream with many entries at once (saves many IO accesses)
- entries are compressed with gzip: increases capacity of cache
- concurrency for stream-writing and compression: all writings to the cache are non-blocking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5284 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-18 08:57:48 +00:00
orbiter
826ca79735 refactoring and new architecture to store the files of the web cache:
- files are not stored any more as individual files
- a new database structure using BLOBHeap files stores many cache entries in common files
- all file-writing procedures had been migrated to generate byte[] objects which are written with the new database methods

this is only an intermediate step to the final architecture, where cached files are written together with their metadata in one single database structure.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-16 21:24:09 +00:00
orbiter
998861acfd - some refactoring in BLOBHeap to enable more gap processing functions
- better gap merging in BLOBHeap
- shrinking of heap file if gap is at end of file when file is closed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-13 21:15:54 +00:00
orbiter
766cad6e93 enhancement in memory management of BLOB Heap files / merging of deleted entries
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5266 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 22:15:01 +00:00
orbiter
7860d5d632 fix for bug in seed list management (cause was bad class overloading, only visual effects!)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5265 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-12 19:51:53 +00:00
orbiter
ffed5fc415 fixed problem with lost peers in database
migrated seedDB from BLOBTree to BLOBHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 14:40:02 +00:00
orbiter
6fb865fbdc - fix of bug in iterator in kelondroBLOBHeap which caused bug in crawl profile listing
- some refactoring of classes that use kelondroMap (Map instead of HashMap)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5262 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-10 08:39:11 +00:00
orbiter
9ac16f565b - fixed several bugs in database management functions
- fixed a display bug for the performance graph
- fixed deadlock when initialization of awt happens simultanously
- removed some debugging output

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5245 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-10-03 18:57:02 +00:00
orbiter
e1f67262f7 - added and removed some debugging output
- fixed a bug with merge method
- patched wrong output of language identification (not fixed, only patched!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5181 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-18 14:12:15 +00:00
orbiter
25a62cdc3f small fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5161 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-15 15:11:59 +00:00
orbiter
1eb813bd43 shifted index deletion-on-exit rule to the class where the errors are produced
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5141 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-12 11:51:48 +00:00
lotus
0bb4fbc403 delete corrupted collecion.index on exit for rebuild on next start
see http://forum.yacy-websuche.de/viewtopic.php?p=9725#p9725

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5135 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-10 12:55:14 +00:00
orbiter
77ee0765a4 - added domain statistic generation to IndexControlURLs_p.html servlet
- added 'delete all' button to all results of such a domain statistic output which causes that all urls to this domain are deleted
- extended stack cleaner to clean also the statistics: they are not completely destroyed, only the smallest counting domains are removed


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5117 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-04 19:41:57 +00:00
lotus
e645bae29f display table in log
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5106 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 13:14:01 +00:00
orbiter
ead39064c5 fixed problem with wrong result number calculation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5105 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 10:04:46 +00:00
hermens
2437beb96c fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1360&p=9321#p9321
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5104 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 07:39:03 +00:00
orbiter
7b12e77a63 fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1393&hilit=&p=9655#p9655
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5103 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:50:42 +00:00
orbiter
05dbba4bab added logging conditions to all fine and finest log line calls
this will prevent an overhead for the generation of the log lines in case that they then are not printed

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5102 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-03 00:30:21 +00:00
orbiter
d3d41e2ee4 - fixed problem with searching with quotes (still not complete, but not as bad as before)
- fixed parsing of crawl-delay statements when seconds were given with float numbers
- enhanced performance of profiling (not too many loggings; not more than one per second)
- removed some debug output
- fixed wrong return type in logging
- added a logging condition in httpd to prevent that logging statements are generated when they are not written (should be added everywhere!)
- fixed wrong word distance computation in RWI management


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5101 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-09-02 23:49:48 +00:00
danielr
3c68905540 remove redundant null checks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5065 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 08:37:39 +00:00
danielr
753a1ae430 - changed default browser from netscape to firefox
- fixed "Inefficient use of keySet iterator instead of entrySet iterator" [WMI_WRONG_MAP_ITERATOR, FindBugs]
- fixed some possible null pointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5063 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-20 07:54:56 +00:00
orbiter
7989335ed6 Preparations to replace the HTCache with a new storage data structure:
- refactoring of the HTCache (separation of cache entry)
- added new storage class for BLOBs. (not used yet, this is half-way to a new structure)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5062 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-19 14:10:40 +00:00
orbiter
bdae051d9a - extended new performance graph (better timing)
- added paths for new libraries in classpath for eclipse
- refactoring to remove compiler warnings (static access to finals variables)
- removed some unused import

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5055 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-13 10:37:53 +00:00
danielr
621b473b18 * removed some warnings of findbugs (http://findbugs.sf.net)
- removed unnecessary code (unused variables, String.toString)
- corrected some calculations (cast int to double or long ;)
- improved little performance (using Integer.valueOf() instead of new Integer)
- log if some File-actions fail (mkdir(), delete(), ...) and some ignored exceptions
- finalized some (more) fields
- finally close some streams
- made inner classes static if not using environment
- generalized some equals (from specificClass to Object)
- fixed some potential nullpointer accesses


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5039 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-06 19:43:12 +00:00
danielr
17b7845eb5 * refactoring
- moved constants from plasmaSwitchboard to own class (all 232 ;)
- moved remoteProxy-Methods to httpRemoteProxyConfig, better names
- removed some unnecessary code (else-statements)
* formatting (correct indentation)
* minor bugfixes (due to findbugs.sf.net)
* hopefully fixed "missing quote" (announcing StringParts as UTF-8)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5031 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 13:57:00 +00:00
danielr
3bb870bfcd added final where possible
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5030 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-08-02 12:12:04 +00:00
danielr
7913bdb75b Flextable: filename in errormessage if inconsistent
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-24 08:22:36 +00:00
orbiter
c3d461d191 - removed superfluous copyright statement
- updated my email address

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5011 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-20 17:14:51 +00:00
orbiter
606b323a2d fixed bug that appeared when a new crawl ist started
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4989 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 10:59:06 +00:00
orbiter
38eb5bd1ee fixed a bug in kelondroBLOBHeap. The following files are probably inconsistent and should be deleted:
DATA/HTCACHE/responseHeader.heap
DATA/PLASMADB/crawlRobotsTxt.heap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 10:38:20 +00:00
orbiter
28d5703f8a - fixed a bug in Robots.txt loader which could have caused that robots.txt files had been loaded from the same domain more than once
- patch in BLOBHeap to prevent OOM during startup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4987 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 09:12:54 +00:00
orbiter
7b1c9e6aee discovered and removed a (possibly large) memory leak:
many classes used the kelondroMapDataMining (was: kelondroMapObjects) which adds statistical
functions to the kelondroMap (was: kelondroObjects), but these functions were not used by these
classes. Especially the HTCACHE and robots.txt database allocate a very large number of objects
for statistical use, but never used them. By replacing the kelondroMapDataMining with the
kelondroMap object for these classes now less memory is allocated.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4986 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:34:48 +00:00
orbiter
0f5fe8cc53 refactoring of method calling for objects from kelondroMapDataMining
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4985 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-11 07:15:46 +00:00
orbiter
4acf0a61cd refactoring of kelondroObjects (mainly renaming to kelondroMap)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4982 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 22:08:16 +00:00
orbiter
1e6d12f146 Major update to BLOB data structures:
- introduced a new BLOB file format: kelondroBLOBHeap. This is a flat file with an index in RAM.
  very similar to the eco-tables, but with flexible value sizes. It will replace the kelondroBLOBTree,
  which is based on a kelondroTree, a file-AVL-based index data structure.
- the HTCACHE header file was replaced by the new blob heap file structure
- the robots.txt file was replaced by the new blob heap file structure
- the robots parser was enhanced (bugfixing for double-loading of the same robots.txt)
- other BLOB-dependent data structures were prepared to use also the new BLOB heap
- fixed a bug in the snippet fetch process: the file header was not written to the header index
There should now be less IO during snippet fetch and during crawling


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4978 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-10 00:47:37 +00:00
orbiter
81f75f5056 - removed unnecessary classes (these objects are much easier to handle using generics)
- generalized BLOB referencing. This is the preparation to use another BLOB class, the kelondroHeap


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4977 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 23:52:53 +00:00
orbiter
b38f467e3c better SRU compliance
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4976 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-07 21:50:24 +00:00
orbiter
a6719dfd2b - refactoring of robots parser
- no more keep-order parameter in remove (it was not possible to make this strict, and not useful)
- some small enhancements in balancer
- robots parser without references in switchboard
- changes synchronization in robots

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4969 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-05 00:35:20 +00:00
orbiter
474e29ce4a added options to configure the 'corporate identity'-icons, the home page link and the greeting line from
the skin menue. Additionally an example is given there how to integrate a search page with an iframe.
Please see the skin menu.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 23:37:04 +00:00
orbiter
474659a71f - modified and enhanced the crawl balancer: better list export, fixing of damaged crawl queue at start-up, re-sorting at start-up to enhance domain order
- added option to set minimum crawl delta for domains in balancer
- added default values to crawl deltas in yacy.init
- added configuration for these deltas in performance queues
- enhanced performance setting computation (more time for indexing queue for a faster flush
- remote crawling is now enabled during local crawling if indexer has space and time for more links
- added database stub for new distributed file system
- refactoring of time computation to get an abstraction level that will be used by a TTL rule in new distributed file system

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-07-03 13:08:37 +00:00
danielr
dba7ba079e fixed NPE seen with queues_p.xml (serverClassLoader finds already loaded classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4952 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-23 16:55:46 +00:00
orbiter
6bdd99e065 - more asserts to solve the ooB-problem
- better caching (?), lets see how it behaves

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-18 21:08:56 +00:00
orbiter
b928ae492a some code-cleanup and possible speed enhancements in different core methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-17 23:56:39 +00:00
orbiter
c998dc6556 - added security functions to flush url and search caches in case that memory is full
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4933 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-16 21:39:58 +00:00
orbiter
f4ae8082c3 - better error analysis for ooRange Exception in kelondroBase64Ordering
- quadcore support for kelondroRowSet array ordering

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4932 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-15 23:25:57 +00:00
orbiter
84cbe75005 more asserts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-15 00:04:59 +00:00
orbiter
e269c12710 small changes in partition routine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 23:17:56 +00:00
orbiter
31efb8fbee - fix for LOG path generation when the DATA/LOG does not exists (fix for bug introduced in SVN 4923)
- some more/better asserts
- slight performance enhancements in remove method in index management. Works for all who do not run using asserts (the majority)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 22:51:47 +00:00
orbiter
21c87c36e3 added a log line
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4925 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 11:57:46 +00:00
danielr
68c38c2d34 - WatchCrawler shows status without JavaScript
- Performance can be scaled + DHT-profile
- names for pool-threads
- some small refactorings


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4923 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-14 10:24:58 +00:00
orbiter
3330181aa0 refactoring:
find a better way to store BLOBs; generalize current BLOG data structure (kelondroDyn)
and prepare it to replace it with something better. The best candidate is the kelondroHeap,
which will become the kelondroBLOBHeap;
removed also some never-used classes

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 23:12:24 +00:00
orbiter
9a9737a54e fix for "no more elements available" exception
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4901 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-07 22:07:25 +00:00
danielr
7feae906aa - organize imports
- removed potential null pointer accesses
- removed unnecessary casts


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-06-06 16:01:27 +00:00
orbiter
e91bf4c8cc - fix for bad reset of index / bad index location after deletion
- some modification of rssTerminal window location and size

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4850 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-24 21:30:22 +00:00
orbiter
25192e0d36 added a deletion button to indexControlRWIs that deletes the complete web index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4847 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-24 12:30:50 +00:00
orbiter
cfe6790498 - added option to switch between yacy networks, especially between the two default networks (freeworld and intranet),
from the ConfigNetwork online interface
- to make this possible, a large refactoring and reorganisation of data structures was necessary

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4803 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-14 21:36:02 +00:00
orbiter
5fde679acb - fixed problem in performance configuration
- extended rss fetch size for rssTerminal


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-13 15:28:55 +00:00
orbiter
db032fb6de - added RWI transmissions to the event terminal
- fixed bug in Collage
- added 'embedded mode' to collage
- integrated Collage to terminal_p as iframe in embedded mode (Pictures now visible in terminal_p)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4797 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-13 11:46:20 +00:00
danielr
0d3808bd9e minor refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4775 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-08 16:51:01 +00:00
danielr
d4bce6affd refactoring (initialized static fields, removed empty if/else, serialized some fields in serializable classes)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4755 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-03 09:06:00 +00:00
orbiter
d0678f7ab9 refactoring as result of
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=959&p=7560#p7560

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4752 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-05-01 22:40:42 +00:00
orbiter
32b5b057b9 - modified, simplified old kelondroHTCache object; I believe it should be replaced by something completely new
- removed tree data type in kelondroHTCache
- added new class kelondroHeap; may be the core for a storage object that will once replace the many-files strategy of kelondroHTCache
- removed compatibility mode in indexRAMRI


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4747 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-29 22:31:05 +00:00
orbiter
b9a2a2d287 more search performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4735 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 15:09:06 +00:00
orbiter
ff755fb858 small corrections and enhancements after search timing profiling
search should be a little bit faster now

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-24 13:31:55 +00:00
danielr
48ffd61e6a changed "patched wrong" to warning, so it goes to the logfile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4716 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 07:54:44 +00:00
orbiter
2f629d20a7 - tried to fix the '4217666-problem'
- removed more unused code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4715 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-19 04:24:29 +00:00
orbiter
45ae3da7e7 another patch to prevent NPE in EcoTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4698 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 05:33:32 +00:00
orbiter
93376acdca fixed a bad chunkcache limit check which could have caused ArrayIndexOutOfBoundsExceptions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4695 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:49:02 +00:00
orbiter
1cab240198 patch for possible NPE in EcoTable iterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4694 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-14 03:20:37 +00:00
orbiter
8fe39ebd74 -fixed file transmission with POST. The only usage was in ranking transmission, therefore:
-fixed ranking transmission

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-12 08:12:51 +00:00
orbiter
444dce7e81 more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 15:28:58 +00:00
orbiter
2c2dcd12a2 - enhanced performance of Eco-Tables: less time-consuming size() - operations
- will increase speed of indexing and collection.index creation


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4675 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-10 13:24:55 +00:00
orbiter
14404d31a8 - enhanced performance graph (more info)
- added conditions for rarely used logging lines to prevent unnecessary CPU usage for non-printed info

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4667 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 14:44:39 +00:00
orbiter
696b8ee3f5 fix for http://forum.yacy-websuche.de/viewtopic.php?p=6806#p6806
- removed all InputStream.available() because this does not work for files > 2GB
- iterator terminate when a IOException occurs
- added handling of non-executing index.add methods to enhance assert usage
- added index for file indexes > 2GB, to be used in new indexHeap

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4666 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-08 11:55:59 +00:00
orbiter
225f9fd429 various fixes
- shutdown behavior (killing of client sessions)
- EcoFS reading better
- another synchronization in balancer.size()


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4662 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 13:12:58 +00:00
orbiter
6e36c156e8 added more logging to EcoFS
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4661 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-07 09:52:25 +00:00
orbiter
319144f4b2 fix for outofbounds-excception in EcoFS chunk iterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4657 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 22:28:17 +00:00
orbiter
a9cf6cf2f4 generalization of index container-heap class.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4654 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 20:31:16 +00:00
orbiter
5e4fddc1e6 more logging for new EcoFS.ChunkIterator to find bug for
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1024&hilit=&p=6806#p6806

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4652 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 18:47:49 +00:00
orbiter
117ae78001 speed enhancement for reading of eco-table indexes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4647 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-06 11:50:15 +00:00
danielr
5c3c1fdf41 replaced httpc with Apache Jakarta Commons HttpClient (includes some refactoring ;)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4640 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-05 13:17:16 +00:00
orbiter
783a4c9edb strong speed enhancements for the index cache dump and restore:
storage and loading is 30 times faster! a cache of 100000 RWIs needed 180 seconds
to store and 100 seconds to restore; now the same cache needs only 6 seconds to store and
3 seconds to restore. The cache size has decreased now by 30% (95 MB instead of 150 MB).

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4634 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-04-02 13:18:23 +00:00
orbiter
d2f4926951 - more logging for balancer to get a hint where the problem is
- fix for new concurrency method in kelondroSplitTable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4631 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 18:45:27 +00:00
orbiter
20dadba426 - added a deadlock prevention function in cache flushing
- removed unused methods in collection index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 17:51:51 +00:00
orbiter
764a40e37d speed enhancements for crawler and url retrieval (affects also search speed)
- concurrency for LURL-fetching: this can be done using a concurrent lookup into the separated url databases. Concurrency is possible because there is no IO during lookup. The more LURL-Tables are present, the better is the speedup. More CPUs will increase speed
- because a large number of LURL-lookups are made during crawling (for double-check), the LURL-Lookup speed enhancements enhances also crawling speed
- search speed also profits from LURL-lookup enhancement
- changed some flushing parameters in word index caching which should make better use of large word index caches and should speed up indexing
- removed flush chunksize parameter, because this was only useful for IO path enhancement feature which was removed some weeks ago to prevent blocking and deadlocks during search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4628 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 15:41:19 +00:00
orbiter
3ce3a4a3a1 added stub for new index container heap data structure (purpose: index folding)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-30 22:58:42 +00:00
orbiter
968c775025 - preparation of parsing/indexing queue for concurrent execution
- remote crawl receipts are now transmitted concurrently in separate threads (makes remove crawls much faster!)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 22:43:38 +00:00
orbiter
d6050b9ffb - separated the LURL data storage and Crawl result stack for process supervision.
this is another step to enable multiple, concurrent fulltext-indexes
- another try to make the yacy-httpc more stable

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4602 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-26 14:13:05 +00:00
orbiter
fba46c51d7 fixed non-termination bug in qsort
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4593 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 23:15:28 +00:00
orbiter
541b817502 refactoring of switchboard queueing
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4591 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-22 01:28:37 +00:00
orbiter
fc94fbe224 another improvement to the collection sorting
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4589 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 23:11:04 +00:00
orbiter
11270d450e better quicksort-pivot computation: 30% faster (measured with test program)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4588 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 22:01:12 +00:00
orbiter
3e44293f07 - fixed a problem with thread pools in row collection
- added a line-viewing feature in threaddump	

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4587 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 14:21:58 +00:00
danielr
e43051b125 - fixed Threaddump output (html-escaped ie. <init>)
- in EcoFS converted comments to javadoc


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-20 10:20:55 +00:00
orbiter
433ff855f7 - fixed another concurrency problem in collection sorting
- fixed a typing problem that was introduced in svn 4579 and caused the crawl monitor to fail

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4585 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 23:47:24 +00:00
orbiter
19286fa2d1 tried to fix seed2.old.db-problem
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4584 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 22:35:19 +00:00
orbiter
f3996e63b8 tried to fix more deadlocks:
- changed connection modes in ftpc
- replaced sort tread pool in row collections by new one using util.concurrent. the old pool had caused blockings

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4582 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-19 11:23:43 +00:00
orbiter
fa1090113d - next try to fix the networking problem:
set the maximum transfer size to less than MTU=1500-52: buffer size <= 1448
- some refactoring of transfer methods (naming)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4558 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-14 00:16:04 +00:00
orbiter
65785da8f2 new method for best hash computation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-10 23:28:05 +00:00
orbiter
9eddc1506b - one try to fix the httpd problem
- fix for handling of collection index that appears when removing elements
- added another navigation method (stub, not working yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:58:22 +00:00
orbiter
7cc4ff05c9 some code enhancements and bugfixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:48:24 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
danielr
fbe335db73 consistent use of de.anomic.server.serverMemory to get information about memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4522 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-02 15:42:50 +00:00
orbiter
4fdf695064 - fixed a bug in remote search that prevented that any results had been generated (!)
- added a great number of printStackTrace and new exceptions that shall be used to find the cause
  for a bug in yacy client-server communication which causes the interruption of data transfer
  which then causes the parser bug for the seed strings.
- tried to fix the communication bug on server-side (copy functions)
Be aware that the log may be full of errors and bugs - there should not be more bugs but there is more to see


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4519 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 23:12:43 +00:00
orbiter
1dce2f1079 more multithreading support:
- replaced some synchronized classes by classes from util.concurrent
- used a util.concurrent.SynchronousQueue to implement a persistent sorting thread in
  the very basic kelondroRowCollection which supports sorting with a second thread
  in case that a double-core processing CPU is used

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4517 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-27 15:16:47 +00:00
orbiter
677ee2ea04 added remove operation to collection index (re-activation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-23 00:14:11 +00:00
orbiter
d477483373 stronger criteria to use RAM copy to use table copy
(should use less RAM)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4502 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-22 23:46:27 +00:00
orbiter
a7abee3578 - fixed some data types in new search stack
- added image domain presentation to image preview
- added new search page to menu
- added automatic re-search when an old search profile is requested and a crawl is ongoing,
  to fetch newly crawled entries

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4501 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 23:40:38 +00:00
orbiter
727feb4358 - fixed some bugs in ranking computation
- introduced generalized method to organize ranked results (2 new classes)
- added a post-ranking after snippet-fetch (before: only listed) using the new ranking data structures
- fixed some missing data fields in RWI ranking attributes and correct hand-over between data structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-21 10:06:57 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
066c88140f quickfix for OOM, see http://forum.yacy-websuche.de/viewtopic.php?f=5&t=875&hilit=&p=5686#p5686
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4488 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-16 12:16:53 +00:00
orbiter
bd63999801 - faster search: using different data structures that avoid multiplr calculations
- no more table copy for error-eco table
- optional table copy for lurl-entries
- more abstractions (less single constant strings)
- better logging (using host names instead of ips)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4459 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-07 22:16:36 +00:00
orbiter
141db7ba48 there is less RAM needed for eco table (its just a security-plus for RAM check)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 15:51:51 +00:00
orbiter
249d61759a fix for false RAM table activation in EcoTables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4441 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 10:33:45 +00:00
orbiter
a1e9e6e2e6 fix for search result page navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4431 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-03 02:23:04 +00:00
orbiter
7404256997 - no more search time-out!
- fixed a bug with last commit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 23:53:39 +00:00
orbiter
cd3e0d6f03 tried to fix another eco bug
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-02 21:36:19 +00:00
orbiter
a8a5df4a51 - more dublin core naming of page metadata
- better presentation of result counters in search results

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-30 21:58:30 +00:00
orbiter
7d875290b2 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 22:13:30 +00:00
orbiter
0f5c4abaca more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-29 10:12:48 +00:00
orbiter
1a296af6ff more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4412 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-28 20:08:32 +00:00
orbiter
2485681002 added termination control for RotateIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 11:44:27 +00:00
orbiter
e2e7f065e9 minor fixes, some generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4398 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 23:58:18 +00:00
orbiter
15397298dc - refactoring of indexControlRWIs: moved statics to own class; better Dublin Core naming
- fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=759&hilit=&p=4866#p4866
- some bugfixes in EcoTable according remove method
- switched more tables to Eco: crawl Profiles, htcache, seeddb, newsdb

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4397 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-24 22:49:00 +00:00
orbiter
6dc679785f - fixed bad sort behavior of kelondroRowSet, in this case: no sort at all!
see http://forum.yacy-websuche.de/viewtopic.php?p=4841#p4841
- some memory calculation enhancements in kelondroFlex and a little bit more logging

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 20:18:36 +00:00
orbiter
0b4205eb5a - fix double-deletion in eco tables
- changed behaviour of sort moment (not during a get)
- added some asserts in snippet cache for debugging

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-23 11:13:39 +00:00
orbiter
4ce6fab428 added special handling for doubles in eco tables after initialization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4370 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 21:40:25 +00:00
orbiter
634430c48a - more logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4368 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:44:12 +00:00
orbiter
d372a78aef some fixes to bring back lulabads peer..
see also: http://forum.yacy-websuche.de/viewtopic.php?p=4772#p4772

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:02:20 +00:00
orbiter
f945ee21d2 some security additions, keep maximum byte[] size to 2^27
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4350 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 23:46:27 +00:00
orbiter
2f3b2f3481 - extended dbtest for comparisment tests
- added initial space option for eco tables
- used initial space value in initialization of collectionIndex, this should avoid OOM failures" /Volumes/Magneto/dev/workspace/trunk/source/dbtest.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroCollectionIndex.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroDyn.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroEcoTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroRow.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroSplitTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlBalancer.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlStacker.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlZURL.java
- added index consistency check (checks for double-occurrences of primary keys in file)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 21:42:35 +00:00
orbiter
9eb746863d interface enhancements for eco records memory statistics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 01:51:02 +00:00
orbiter
9abc927645 to fix inconsistencies in collection index, a double reference reporting mechanism has been implemented
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 01:22:46 +00:00
orbiter
58a1f518f8 fixed some problems with eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 12:23:56 +00:00
orbiter
d4d07802ac better RAM protection using eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 01:50:24 +00:00
orbiter
f4e9ff6ce9 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 00:40:19 +00:00
orbiter
94f21d9403 activated new kelondroEcoTable file structure.
This data structure replaces almost all files in the PLASMA directory
also the collection.index and the LURL-db will be created as Eco-DB, if it does not exist before
existing Flex-databases will be used as they are (the is no data lost)
If you want to force the creation of a Eco-collection.index, simply delete the old index.
The Eco file system will only be used if there is enough memory.
The collection.index RAM limit is 200MB, if you have less, a flex-Table is createt.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 21:48:08 +00:00
orbiter
dc26d6262b - removed write buffer from kelondroCache (was never used because buggy; will now be replaced by new EcoBuffer)
- added new data structure 'eco' for an index file that should use only 50% of write-IO compared to kelondroFlex
The new eco index is not used yet, but already successfully tested with the collectionIndex
The main purpose is to replace the kelondroFlex at every point when enough RAM is available.
Othervise, the kelondroFlex stays as option in case of low memory (which then can even use a file-index)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 12:12:52 +00:00
orbiter
b806a6af8b renamed kelondroEcoRecords to kelondroFullRecords (the "Eco"-name will be used for something else)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4331 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-13 00:41:22 +00:00
borg-0300
3cab85158c update for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:41:45 +00:00
borg-0300
53367d941a more information (BASE64)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:24:24 +00:00
orbiter
b3636f5ba8 re-implemented file index in kelondroFlex
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4323 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 14:43:28 +00:00
orbiter
a6ca3b51be more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4322 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 14:13:08 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
71bcf02d3a - removed pro-version (is the same as standard version, use the standard instead)
- changed yacy logo
- removed crawlOrder protocol (unused)
- removed file index in kelondroFlex (will not work, it takes too long to maintain)
- fixed remoted crawl for clusters (now denies remote crawls from peers outside cluster)
- 0.562

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4317 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 23:05:52 +00:00
orbiter
ce7257483d fix for bad fix with random access files (no performace enhancement)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4314 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 16:00:05 +00:00
orbiter
016fc594af more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4311 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-09 09:58:56 +00:00
orbiter
3e3d2e39a4 - some refactoring and redesign of kelondroBytesIntMap (created new class kelondroRAMIndex)
- more generics
- preparation to extend the balancer for flexible forced delay times
- set different random-access type, should now omit update of metadata in file and could be a bit faster (lets see)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4309 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-07 22:36:48 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
orbiter
df2a7a8ac8 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4295 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 18:47:45 +00:00
orbiter
9d8b17188a more generics, bugfixes for wrong cast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 03:39:36 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
hermens
4748d5c1ab Some enhancements to time management:
- remove unnecessary generation of Calendar and Date objects
- synchronized SimpleDateFormat objects in blog-, message- and wikiBoard
- correct use of TimeZones and SimpleDateFormats



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-20 17:11:35 +00:00
orbiter
52dd015218 new release strategy: the standard release is now built the same way as the pro release
a new release type was added: 'embedded' which is the same as the current standard release was
this will not have any effect to the next release 0.56, which will still a pro-release on public download
the transition the the new release strategy must be done now to enable automatic update by the updated in future releases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4287 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-20 02:46:41 +00:00
orbiter
48138952ff added memory measurement for index recreation to avoid OOM during index RAM space extension
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-11 15:07:03 +00:00
orbiter
a3bfd668aa opening of array files at startup time, not when first time the web index is accessed
this speeds up the first search after startup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-10 02:40:16 +00:00
orbiter
c527969185 - enhanced monitoring of ranking parameters
for details, please try http://localhost:8080/IndexControlRWIs_p.html
- fixed computation of ranking ordering in some cases

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4220 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-16 14:48:09 +00:00
orbiter
ec7ba0d3d0 - fixed problem with too small sort fields (sortbound was not set)
- slightly changed handling of date in indexURLEntry

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4214 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-13 02:24:10 +00:00
orbiter
64b3b79e44 - fix for termination problem with uniq()
- addition to seed dna interpretation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4208 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 14:39:30 +00:00
orbiter
0abf33ed03 - tried to remove deadlock
- enhanced searchtime in kelondroRowSets
- enhanced uniq() - reverse enumeration causes less time in case of mass removal of doubles

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4207 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-12 01:14:51 +00:00
orbiter
2421127612 fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=513&hilit=
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4204 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-11 01:32:54 +00:00
orbiter
d0d2771883 disabled multiprocessoring of rowCollection.sort for testing purpose
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4202 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-11 00:28:22 +00:00
orbiter
edc4da5317 fix for division by zero in test reoutine
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4201 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-10 08:57:00 +00:00
orbiter
df38aaf7bd update to RowCollection sort speed-enhancements:
- better handling of small collections (less overhead)
- usage of pre-sorted limits
- different re-sort limit
- more testing procedures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4200 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 15:34:11 +00:00
orbiter
ecba35de72 enhanced computing speed of kelondro core function: sorting
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.

For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 00:51:38 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
borg-0300
a5d28785b1 less OOM (works for me)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4194 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-02 14:55:46 +00:00
orbiter
f8318436a1 fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4177 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 16:32:39 +00:00
orbiter
7d57b80598 distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 15:26:47 +00:00
orbiter
9a7b093eed tried to avoid endless loop, see also:
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=467&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 14:35:45 +00:00
orbiter
b856e377a9 some additions and a small bugfix to SVN 4158
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-21 23:26:22 +00:00
fuchsi
f717beecb1 - Changed yFormatter handling to be more flexible and produce more readable code for server pages. There are serverObject.putNum() methods to allow adding of number type values in a formatted form, and put() methods for number types that add them without formatting. This reduces the need to transform them into Strings in server pages and removes the HTML encoding step which is unecessary for numbers.
- some minor code cleanups (mostly unnecessary casts, null checks)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4166 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-19 04:13:46 +00:00
fuchsi
b5f7df8d0a Speed up remove operations in rowCollections.
- Array element shifting during remove is only done when it is necessary to keep the order of a row collection.
- This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-11 17:17:08 +00:00
orbiter
dbd1eeead5 fix for missing object miss-cache flush value:
the value is alway zero because there is no miss-cache flush
see http://forum.yacy-websuche.de/viewtopic.php?f=6&t=288

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4083 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-09 18:35:05 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
4779f314fe first version of next-generation search interface:
- snippets are not fetched by browser using ajax, they are now fetched internally
- YaCy-internat threads control existence of snippets and sort out bad results
- search results are prepared using SSI includes
- the search result page is visible right after the search request, the results drop in when they are detected
- no more time-out strategy during search processes, results are shifted within queues when they arrive from remote peers
- added result page switching! after the first 10 results, the next page can be retrieved
- number of remote results is updated online on the result page as they drop in
- removed old snippet servelet (which had been also a security leak btw)
- media search is broken now, will be redesigned and fixed in another step


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4071 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-03 23:43:55 +00:00
orbiter
72752bb503 because of a new database structure handling, the memory need for accessing
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
   try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:36:43 +00:00
orbiter
3ca8f71cbb refactoring of dbtest to create separated kelondro sql connector interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4042 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-11 22:40:24 +00:00
orbiter
9678d1b282 fixed new EcoRecords-Nodes. Here I omitted object content copying before
to avoid massive System.arraycopy. That did obviously not protect enough the Node objects


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4032 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-06 10:10:33 +00:00
orbiter
9a860cf397 bugfix for wrong record tracker message
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4026 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 12:24:52 +00:00
orbiter
9628db6cdc enhanced memory allocation during database access:
- refactoring of kelondroRecords; this class is now divided into
  kelondroAbstractRecords, kelondroRecords, kelondroCachedRecords, kelondroHandle and kelondroNode
- better abstraction of kelondroNodes, such nodes may now be crated by different classes
- a new Node defining class kelondroEcoRecords defines Nodes that do not need so much allocation and System.arraycopy
- there is less memory transfer on the bus, especially for collection index
- now half of memory needed for web index access


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4024 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-03 11:44:58 +00:00
orbiter
57a5b6fa71 some generalization of remote proxy configuration and setting handling in httpc
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4023 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-02 00:42:37 +00:00
orbiter
f5a4efb76e fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=192&hilit=&p=1034#p1034
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3996 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-20 08:06:21 +00:00
orbiter
40b0547611 - documentaton changes (removed old forum links)
- different handling of link quotation
- different handling of link normalization
- enhanced html/unicode en/de-coding

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3993 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-19 15:32:10 +00:00
orbiter
89e1848db6 fixed problem with favicons:
target servers had been able to see search words from the referrer of the favicon fetch.
This has been removed by using the getImage - servlet for favicon fetch.
Since java does not support loading of bmp and ico-Images, such parsers had been added.
The image parser had been coded from their original microsoft documentation.
This influences also the image-search functionality: there can now be a preview
of found bmp-images. Another benefit: favicons for search results are now cached with the HTCACHE.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-15 01:34:01 +00:00
orbiter
a4e8ad95ab enhancements to news and switchboard queue processing
removed direct access and replaced by iteration

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3961 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-13 13:00:18 +00:00
orbiter
4968556668 - fix for broken news queue during iteration
- enhancement for searching special news (usage of new iterator)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3957 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-10 21:22:06 +00:00
orbiter
5702257a5f fix for elementCount bug when db was reset
possible solution for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=124&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3955 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-06 09:50:59 +00:00
orbiter
208b5297f1 enhanced handling of news records:
result is a speedup of Surftips, Supporter, and Network page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3954 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-07-05 22:56:37 +00:00
orbiter
e03fcf4627 SSI fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=29
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3936 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-29 10:45:13 +00:00
orbiter
1a45ecb356 - fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=14&p=137#p137
- fix for missing restart script in ant built target
- removed some more synchronization for size() operations
- removed blocking statement on search page

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3935 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-28 22:06:33 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
orbiter
6518bb6c08 changed release strategy:
we will provide two different releases in the future, one standard release and one 'pro'-release.
the 'pro'-release contains all additional parsers AND has different default performance values.
The pro-version differs therefore from the previous 'all'-version by this default values.
The pro-configuration is automatically choosen if the libx-folder exists. If a version is once initialized, its configuration stays independently from an existing libx folder.
The ant targets had been changed. There are now 3 different targets to create standard and pro-releases, and one target to upgrade:
- dist: creates a standard release (only, no libx target any more)
- distPro: creates a pro-release (includes the libx)
- distExt: creates a libx-release which includes the libx-folder only. It may be used to upgrade from standard to pro
Furthermore, the naming of 'dev'-releases had been removed.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3902 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-16 14:11:52 +00:00
orbiter
5dd9acc2a7 removed calls to deprecated methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3865 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-11 21:33:45 +00:00
(no author)
ef24bed406 Sorry...
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3760 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-24 16:25:07 +00:00
(no author)
a29cb2e1af blupp
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3759 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-24 16:14:46 +00:00
orbiter
11ac7688d5 reverted a part of last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3736 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 17:52:11 +00:00
orbiter
b3f97b5c38 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3735 6c8d7289-2bf4-0310-a012-ef5d649a1542 2007-05-16 17:45:39 +00:00
orbiter
3c5ff7f735 adopted kelondroBytesIntMap to kelondroIntBytesMap
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3734 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 15:20:15 +00:00
orbiter
5551ff5306 enhanced index storage data structure kelondroBytesIntMap
this stores now two index structures, one for data that is aquired during start-up
and one for data that is aquired during run-time. This reduces the grow factor, and should reduce the memory amount in case that a index-reorganisation happens.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3733 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 14:36:56 +00:00
orbiter
872eb46cb9 some redesign of the handling of the index for kelondroFlexTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 10:48:26 +00:00
orbiter
46367afaaa update of memory-protection values
see http://www.yacy-forum.de/viewtopic.php?p=35539#35539

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3709 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-11 18:02:48 +00:00
orbiter
85035dc319 addition to svn 3699: check send/receive if p2p-mode is activated
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3701 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-10 13:27:38 +00:00
orbiter
139c59ebbd - fixed dht selction problem: the seed tables used a wrong ordering
- cleaned some code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3693 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 17:59:36 +00:00
orbiter
22a0e9f117 more timeout-control
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3692 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-09 14:53:17 +00:00
orbiter
0831034e07 fixed non-termination bug for robinson remote crawl peer selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3681 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-07 14:37:50 +00:00
orbiter
191ef16499 fixed wrong ordering that caused bad dht selection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3646 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-04 14:11:50 +00:00
orbiter
35c660654d more debugging lines to fix bug for
http://www.yacy-forum.de/viewtopic.php?p=34935#34935

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3629 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 23:05:19 +00:00
orbiter
81844e85b2 - fixed more cluster routing problems
- fixed a problem in remote search when balancer caused shift process to wait too long

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3627 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-30 00:39:53 +00:00
orbiter
64a6d6e5e6 added new set iterator (needed for last commit)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3599 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 09:52:37 +00:00
orbiter
f8de19fb2f robinson cluster: added client-side protocol implementation
- the network configuration page shows a new option: robinson clusters
- when a global search is made, all robinson peers are excluded, but:
- robinson peers/clusters that provide peer tags and where search words match
  such tags, they are included in global search. Therefore, robinson peers/clusters
  support the global yacy network with their indexes, without doin DHT-exchange


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3598 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-26 09:51:51 +00:00
orbiter
2f3b518169 temporary patch for startup-problem:
http://www.yacy-forum.de/viewtopic.php?t=3854
This is a serious problem that is caused by the database bug between 0.511 - 0.513
which produced a large number of double-entries in the RWI index. The uniq()-method
tries to fix this, and it does not terminate when the index is large and the number
of double-occurrences is also large. This patch does simply implement a time-controlled
termination, which does not heal the inconsistency problem. The uniq-method itself
is correct and does not need a bugfix, the non-termination is simply caused by the large number
of data that is shifted during the process. It was possible to reproduce this behaviour
in a test environment.
A real fix would need to:
- enhance the uniq()-method by using a recursive, binary segmentation of the array to be fixed
- uniq() must report the entries that are double
- the double-entries must be deleted from the collection index (from the index and the collections) to heal the problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-20 07:53:58 +00:00
orbiter
ba525ebf52 - re-enabled path optimization that was disabled during testing
- re-implemented index load/extend optimization that was removed from kelondroFlexTable,
  this is now part of kelondroIntBytesIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3580 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-19 14:55:19 +00:00
orbiter
595ee10468 fixed datatabase inconsistency bugs
inserted many debug lines
added a huge number of asserts
extended database test methods


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3579 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-19 13:37:02 +00:00
orbiter
7a7a1c7c29 fight against problems with remove-methods and synchronization
- some bugs may have been fixed with wrong removal operations
- removed temporary storage of remove-positions and replaced by direct deletions
- changed synchronization
- added many assets
- modified dbtest to also test remove during threaded stresstest

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-17 15:15:47 +00:00
orbiter
b6a5f53020 removed double synchronization from kelondroRecords.USAGE to prevent thread locking.
The method synchronization should be sufficient

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-15 21:13:54 +00:00
orbiter
063063aa0c fix for 100% cpu bug during dht selection
see also: http://www.yacy-forum.de/viewtopic.php?p=34068#34068

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3570 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-13 13:40:19 +00:00
orbiter
25070822a5 fix for http://www.yacy-forum.de/viewtopic.php?p=33925#33925
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 19:08:59 +00:00
orbiter
159bd0cab5 diverses; b.o. fix for http://www.yacy-forum.de/viewtopic.php?p=33914#33914
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 14:58:29 +00:00
orbiter
cdc7b77a62 fix for http://www.yacy-forum.de/viewtopic.php?p=33916#33916
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3548 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 14:47:45 +00:00
orbiter
40c14a4f0e - better implementation of search query properties
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 10:14:48 +00:00
orbiter
fcdf000fbc bugfix for http://www.yacy-forum.de/viewtopic.php?p=33838#33838
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 22:08:40 +00:00
orbiter
ba2c307ab3 optimized memory allocation in kelondroRow.Entry
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 12:10:12 +00:00
orbiter
210ede8230 added a class for byte-array management. This was the result of a very large experiment
to replace byte[] objects within kelondro. Frequent System.arraycopy are common when
kelondroRow.Entry objects are handled. This class may be used to prevent this.
However, experimental replacement of byte[] by kelondroByteArray in kelondroRow.Entry
resulted in complete re-write of large parts of kelondro. This experiment did not
completely lead to a result, because then the interface to kelondro had to be changed
also from byte[] to kelondroByteArray, which may have caused a rewrite of large parts
of YaCy. The experiment is therefore abanonded, but this class remains here without
any function but possibly for future use.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3531 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-30 08:44:43 +00:00
orbiter
847349358b less memory usage during collectionIndex-rebuild
should also speed up that process a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-27 08:21:03 +00:00
orbiter
602ac42010 fix for OOM case when a kelondroTree Node cache grows
See also: http://www.yacy-forum.de/viewtopic.php?p=33275#33275

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3499 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-21 13:26:18 +00:00
orbiter
7af188ff9a fix for http://www.yacy-forum.de/viewtopic.php?p=33089#33089
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3491 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-19 11:59:29 +00:00
orbiter
5bbf010107 removed synchronization of size() method from numerous classes to avoid thread locking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3490 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-18 19:45:23 +00:00
orbiter
861f41e67e redesigned NURL-handling:
- the general NURL-index for all crawl stack types was splitted into separate indexes for these stacks
- the new NURL-index is managed by the crawl balancer
- the crawl balancer does not need an internal index any more, it is replaced by the NURL-index
- the NURL.Entry was generalized and is now a new class plasmaCrawlEntry
- the new class plasmaCrawlEntry replaces also the preNURL.Entry class, and will also replace the switchboardEntry class in the future
- the new class plasmaCrawlEntry is more accurate for date entries (holds milliseconds) and can contain larger 'name' entries (anchor tag names)
- the EURL object was replaced by a new ZURL object, which is a container for the plasmaCrawlEntry and some tracking information
- the EURL index is now filled with ZURL objects
- a new index delegatedURL holds ZURL objects about plasmaCrawlEntry obects to track which url is handed over to other peers
- redesigned handling of plasmaCrawlEntry - handover, because there is no need any more to convert one entry object into another
- found and fixed numerous bugs in the context of crawl state handling
- fixed a serious bug in kelondroCache which caused that entries could not be removed
- fixed some bugs in online interface and adopted monitor output to new entry objects
- adopted yacy protocol to handle new delegatedURL entries
all old crawl queues will disappear after this update!

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3483 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-16 13:25:56 +00:00
orbiter
581db87237 more debug code for
http://www.yacy-forum.de/viewtopic.php?p=33009#33009

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3479 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 15:04:06 +00:00
orbiter
dd06d4cada more logging to better trace bug
http://www.yacy-forum.de/viewtopic.php?p=33001#33001

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3477 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 09:36:54 +00:00
orbiter
96b79bf86d redesigned remove method in kelondroRowSet
This should fix also numerous bugs like
http://www.yacy-forum.de/viewtopic.php?p=31077#31077
(java.lang.ArrayIndexOutOfBoundsException in kelondroRowCollection.removeShift)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 08:55:05 +00:00
orbiter
9f929b5438 better snippet handling in case of snippet load fail
see also http://www.yacy-forum.de/viewtopic.php?p=31096#31096

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3475 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-13 22:18:36 +00:00
orbiter
909d7a8ae9 fixed wrong implemented row iterator in kelomdroFlexSplitTables
this has no effect, until now this iterator was only used on
the Index Administration page.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3464 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 13:55:26 +00:00
orbiter
3ef77d2030 fix for http://www.yacy-forum.de/viewtopic.php?p=29878#29878
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3461 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-09 12:14:25 +00:00