Commit Graph

105 Commits

Author SHA1 Message Date
orbiter
20dadba426 - added a deadlock prevention function in cache flushing
- removed unused methods in collection index

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4630 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-31 17:51:51 +00:00
orbiter
9eddc1506b - one try to fix the httpd problem
- fix for handling of collection index that appears when removing elements
- added another navigation method (stub, not working yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-09 23:58:22 +00:00
orbiter
275a226cc5 refactoring
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-03-04 22:45:45 +00:00
orbiter
677ee2ea04 added remove operation to collection index (re-activation)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4503 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-23 00:14:11 +00:00
orbiter
2327451653 - changed order of database initialisation (index first)
- removed mainly unused init-time for databases (was only used for tree tables, which are not used any more)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4496 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-19 09:14:07 +00:00
orbiter
141db7ba48 there is less RAM needed for eco table (its just a security-plus for RAM check)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4442 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-02-04 15:51:51 +00:00
orbiter
2485681002 added termination control for RotateIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-25 11:44:27 +00:00
orbiter
d372a78aef some fixes to bring back lulabads peer..
see also: http://forum.yacy-websuche.de/viewtopic.php?p=4772#p4772

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-22 20:02:20 +00:00
orbiter
2f3b2f3481 - extended dbtest for comparisment tests
- added initial space option for eco tables
- used initial space value in initialization of collectionIndex, this should avoid OOM failures" /Volumes/Magneto/dev/workspace/trunk/source/dbtest.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroCollectionIndex.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroDyn.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroEcoTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroRow.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/kelondro/kelondroSplitTable.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlBalancer.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlStacker.java /Volumes/Magneto/dev/workspace/trunk/source/de/anomic/plasma/plasmaCrawlZURL.java
- added index consistency check (checks for double-occurrences of primary keys in file)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4349 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 21:42:35 +00:00
orbiter
9abc927645 to fix inconsistencies in collection index, a double reference reporting mechanism has been implemented
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-20 01:22:46 +00:00
orbiter
58a1f518f8 fixed some problems with eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4346 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 12:23:56 +00:00
orbiter
d4d07802ac better RAM protection using eco tables
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4345 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-19 01:50:24 +00:00
orbiter
94f21d9403 activated new kelondroEcoTable file structure.
This data structure replaces almost all files in the PLASMA directory
also the collection.index and the LURL-db will be created as Eco-DB, if it does not exist before
existing Flex-databases will be used as they are (the is no data lost)
If you want to force the creation of a Eco-collection.index, simply delete the old index.
The Eco file system will only be used if there is enough memory.
The collection.index RAM limit is 200MB, if you have less, a flex-Table is createt.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 21:48:08 +00:00
orbiter
dc26d6262b - removed write buffer from kelondroCache (was never used because buggy; will now be replaced by new EcoBuffer)
- added new data structure 'eco' for an index file that should use only 50% of write-IO compared to kelondroFlex
The new eco index is not used yet, but already successfully tested with the collectionIndex
The main purpose is to replace the kelondroFlex at every point when enough RAM is available.
Othervise, the kelondroFlex stays as option in case of low memory (which then can even use a file-index)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4337 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-17 12:12:52 +00:00
borg-0300
53367d941a more information (BASE64)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-12 00:24:24 +00:00
orbiter
a5054c038d - added large number of generics
- redesign of ordering structures in kelondro (old did not work with strict generics)
- 50% IO reduction during read access on kelondroFlex (ommiting of read on index table)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-11 00:12:01 +00:00
orbiter
03e7782269 more generics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4305 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-01-06 19:23:38 +00:00
orbiter
9d8b17188a more generics, bugfixes for wrong cast
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4294 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-28 03:39:36 +00:00
orbiter
4dc438f7e7 moved to Java 1.5:
- changed build script to use java 1.5 compiler
- first stept to resolve missing generics definition (about 400 from over 4100 'missing'-warnings)
- added key-iterator to kelondro databases (for rapid from-memory enumerations, will be used for domain name collection, not used yet)

please set your development environment to use java 1.5!


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4292 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-27 17:56:59 +00:00
hermens
4748d5c1ab Some enhancements to time management:
- remove unnecessary generation of Calendar and Date objects
- synchronized SimpleDateFormat objects in blog-, message- and wikiBoard
- correct use of TimeZones and SimpleDateFormats



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4288 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-20 17:11:35 +00:00
orbiter
48138952ff added memory measurement for index recreation to avoid OOM during index RAM space extension
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4267 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-11 15:07:03 +00:00
orbiter
a3bfd668aa opening of array files at startup time, not when first time the web index is accessed
this speeds up the first search after startup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4263 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-12-10 02:40:16 +00:00
orbiter
ecba35de72 enhanced computing speed of kelondro core function: sorting
the enhancement was made by using better organized data structures and
multi-threading during the sort. A sort can be divided into two separate
processes when the first partition of the quicksort algorithm was done.
Generating a separate thread and starting the thread takes only 10 milliseconds,
so using a separate thread makes only sense if the data amount is large.
statistics about the speed-up:
without ehancement: 250 milliseconds for 100000 entries
with data structure enhancement: 170 milliseconds for 100000 entries
with additional second thread (if second processor is present): 130 milliseconds.

For dual-processor systems, this means about 100% speed-up
a test can be made with the following command:
java -classpath classes de.anomic.kelondro.kelondroRowCollection


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4198 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-09 00:51:38 +00:00
orbiter
6eaa5a0e64 enhanced local search speed. The ranking process is now 6 times faster that before.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4197 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-07 22:38:09 +00:00
borg-0300
a5d28785b1 less OOM (works for me)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4194 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-11-02 14:55:46 +00:00
orbiter
7d57b80598 distinct keepOrder strategy, more discrete implementation of enhancement introduced in SVN 4158
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4176 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 15:26:47 +00:00
orbiter
9a7b093eed tried to avoid endless loop, see also:
http://forum.yacy-websuche.de/viewtopic.php?f=6&t=467&hilit=

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4175 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-22 14:35:45 +00:00
orbiter
b856e377a9 some additions and a small bugfix to SVN 4158
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4173 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-21 23:26:22 +00:00
fuchsi
b5f7df8d0a Speed up remove operations in rowCollections.
- Array element shifting during remove is only done when it is necessary to keep the order of a row collection.
- This will speed up the most expensive operation "common word shrinking" by a factor of 500-1000 (in the worst cases we shifted > 60 GB of data during this operation)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4158 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-10-11 17:17:08 +00:00
orbiter
daf0f74361 joined anomic.net.URL, plasmaURL and url hash computation:
search profiling showed, that a major amount of time is wasted by computing url hashes. The computation does an intranet-check, which needs a DNS lookup. This caused that each urlhash computation needed 100-200 milliseconds, which caused remote searches to delay at least 1 second more that necessary. The solution to this problem is to attach a URL hash to the URL data structure, because that means that the url hash value can be filled after retrieval of the URL from the database. The redesign of the url/urlhash management caused a major redesign of many parts of the software. Since some parts had been decided to be given up they had been removed during this change to avoid unnecessary maintenance of unused code.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4074 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-09-05 09:01:35 +00:00
orbiter
72752bb503 because of a new database structure handling, the memory need for accessing
collection objects has been reduced to 50%:
- set new memory calculation functions for indexing process
- adjusted guessed memory amount
-> Testing needed:
   try new recommended value (see performanceQueues) and see if OOMs occur.
-> report maximum recommended value, so we can set new default values.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@4053 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-08-20 17:36:43 +00:00
orbiter
1782ef57e5 - added SSI parser and include directive for <!--# include virtual="<file>" -->
- added chunked file transfer for non-yacy clients
- SSIs are streamed using chunked transfer, partly delivered pages can be seen in browser before transmission is finished
- added client-side network unit identification
- cleaned up code

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3926 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-06-26 14:37:10 +00:00
orbiter
872eb46cb9 some redesign of the handling of the index for kelondroFlexTable
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3732 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-05-16 10:48:26 +00:00
orbiter
2f3b518169 temporary patch for startup-problem:
http://www.yacy-forum.de/viewtopic.php?t=3854
This is a serious problem that is caused by the database bug between 0.511 - 0.513
which produced a large number of double-entries in the RWI index. The uniq()-method
tries to fix this, and it does not terminate when the index is large and the number
of double-occurrences is also large. This patch does simply implement a time-controlled
termination, which does not heal the inconsistency problem. The uniq-method itself
is correct and does not need a bugfix, the non-termination is simply caused by the large number
of data that is shifted during the process. It was possible to reproduce this behaviour
in a test environment.
A real fix would need to:
- enhance the uniq()-method by using a recursive, binary segmentation of the array to be fixed
- uniq() must report the entries that are double
- the double-entries must be deleted from the collection index (from the index and the collections) to heal the problem


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3583 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-20 07:53:58 +00:00
orbiter
7a7a1c7c29 fight against problems with remove-methods and synchronization
- some bugs may have been fixed with wrong removal operations
- removed temporary storage of remove-positions and replaced by direct deletions
- changed synchronization
- added many assets
- modified dbtest to also test remove during threaded stresstest

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-17 15:15:47 +00:00
orbiter
159bd0cab5 diverses; b.o. fix for http://www.yacy-forum.de/viewtopic.php?p=33914#33914
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3549 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 14:58:29 +00:00
orbiter
40c14a4f0e - better implementation of search query properties
- basic protection against start-up problems when database files are corrupted
- auto-delete of not-critical databases during startup when load error occurs
- on-the-fly reset option for all database tables
- automatic on-the-fly reset for seed tables during enumeration exceptions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3547 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-05 10:14:48 +00:00
orbiter
ba2c307ab3 optimized memory allocation in kelondroRow.Entry
such an entry cannot be instantiated without allocation of new byte[]; instead
it can re-use memory from other kelondroRow.Entry objects.
during bugfixing also other bugs may have been solved, maybe the INCONSISTENCY problem
could have been solved. One cause can be missing synchronization during bulk storage
when a R/W-path optimization is done. To test this case, the optimization is currently
switched off.
More memory enhancements can be done after this initial change to the allocation scheme.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3536 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-04-03 12:10:12 +00:00
orbiter
847349358b less memory usage during collectionIndex-rebuild
should also speed up that process a little bit

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-27 08:21:03 +00:00
orbiter
5bbf010107 removed synchronization of size() method from numerous classes to avoid thread locking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3490 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-18 19:45:23 +00:00
orbiter
96b79bf86d redesigned remove method in kelondroRowSet
This should fix also numerous bugs like
http://www.yacy-forum.de/viewtopic.php?p=31077#31077
(java.lang.ArrayIndexOutOfBoundsException in kelondroRowCollection.removeShift)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3476 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-14 08:55:05 +00:00
orbiter
38b93f8cb8 bugfix for my last commit:
iterator did not consider secondary start point in case of rotation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3456 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 22:07:17 +00:00
orbiter
d755a8026d - better OOM protection
- better memory allocation for FlexTable indexes
- splitting between static index and dynamic index (only the dynamic part must grow)
- to enable a merge-iteration of new splittet index, a huge number of classes needed to be adopted for new iterator classes
- added new iterator classes that support cloneable iterators
- adopted all iterator classes to implement cloneable itarators

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3453 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-08 16:15:40 +00:00
orbiter
3499a364ef a little bit better memory protection
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3439 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-07 09:38:14 +00:00
orbiter
1cba31de43 redesigned ram organization for database caches
- each cache can now allocate as much memory as is available
- no more fixed limits
- replaced old performance memory monitor by new one
- added supervision methods as static functions into the classes that provide cache functionality
- steering of ram allocation is done with two simple limits that are ram availability-relative


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3434 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-06 22:43:32 +00:00
orbiter
db235f2d61 added some memory protection in collection index multiple merge
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-04 22:54:04 +00:00
orbiter
b466baa574 added some memory protection
too large collection arrays are now avoided. By default, the biggest
collection index is 7. larger collections are dumped into a commons
directory, but cannot yet be used. Bevore doing a dump, the collection
is splittet into a part which has only root-references, and stored back
to the collection; the remaining part goes to commons

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3426 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-03-03 00:55:51 +00:00
orbiter
51e12049fa third generation of R/W head path optimization
- data from collection arrays are read in order
- merged data is written in order

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-28 11:13:23 +00:00
orbiter
10a3c20b8d some more enhancements to R/W Head path optimization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 15:54:02 +00:00
orbiter
f4cfd19835 second Generation of collection R/W head path optimization:
- permanent cache flush is switched off. The optimized cache flush
  works better if it is a large number of collections that is flushed
  together
- the flush size can be configured instead the flush divisor. There is
  only one size for all flushes
- collection records that shall be removed during collection transition
  (jump from one collection file to another) are now not really removed
  but only marked in RAM. add-operations to the collection use these
  marked collection spaces
- index bulk write operations are now separated for each file of a kelondroFlex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2007-02-27 13:01:22 +00:00