Commit Graph

687 Commits

Author SHA1 Message Date
orbiter
a9ad863686 second part of 'doubles' fix - better handling of doubles in RAMIndex. More logging.
still missing: deletion of double entries in collections

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5613 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-16 16:13:48 +00:00
orbiter
59427064fb first part of 'doubles' fix (not fully ready yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5612 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-16 00:47:48 +00:00
orbiter
26978b2a25 - better memory protection in kelondro caches: computation of needed memory for cache grow
- removed excessive gc calls
- step to 16 vertical DHT partitions

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5611 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-15 23:35:59 +00:00
hermens
2173865f92 Prevent race condition when switching timezones.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-13 11:59:50 +00:00
orbiter
30a1de41b3 disabled the BufferedIOChunks, because I consider it as broken.
I will try to fix that, but it is better to not use a buffer than using a broken buffer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5600 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-11 15:21:48 +00:00
orbiter
411f2212f2 more memory leak fixing hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5599 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-11 13:31:10 +00:00
orbiter
333489420b - fix for NPE when loading the cytag image
- some hacks for less memory usage:
-- less usage of buffer and cache memory in EcoFS
-- buffer allocation on-demand in BufferedIOChunks
-- removed largest ybr idx

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5595 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-11 10:52:56 +00:00
orbiter
c25c334b75 replaced old DHT transmission method with new method. Many things have changed! some of them:
- after a index selection is made, the index is splitted into its vertical components
- from differrent index selctions the splitted components can be accumulated before they are placed into the transmission queue
- each splitted chunk gets its own transmission thread
- multiple transmission threads are started concurrently
- the process can be monitored with the blocking queue servlet
To implement that, a new package de.anomic.yacy.dht was created. Some old files have been removed.
The new index distribution model using a vertical DHT was implemented. An abstraction of this model
is implemented in the new dht package as interface. The freeworld network has now a configuration
of two vertial partitions; sixteen partitions are planned and will be configured if the process is bug-free.
This modification has three main targets:
- enhance the DHT transmission speed
- with a vertical DHT, a search will speed up. With two partitions, two times. With sixteen, sixteen times.
- the vertical DHT will apply a semi-dht for URLs, and peers will receive a fraction of the overall URLs they received before.
  with two partitions, the fractions will be halve. With sixteen partitions, a 1/16 of the previous number of URLs.
BE CAREFULL, THIS IS A MAJOR CODE CHANGE, POSSIBLY FULL OF BUGS AND HARMFUL THINGS.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5586 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-10 00:06:59 +00:00
orbiter
01b97ef3f8 added new cybertag-tracking feature that was inspired by itgrl
from the forum discussion in
http://forum.yacy-websuche.de/viewtopic.php?p=12612#p12612

The feature will provide two basic entities:
- you can integrate image links which point to your yacy installation anywhere in the web.
  the image can be loaded with
  <img src="http://<yourpeer>:<yourport>/cytag.png?icon=invisible&nick=<yournickname_or_community_id>&tag=<anything>">
  This will place a invisible 1-pixel image. If you change the icon=invisible to icon=redpill, you will see a red pill
  Use this, to track your activity in the web.
- you can view your tracks at
  http://localhost:8080/Tracks.html
- There is a public api to your tracks at
  http://localhost:8080/api/tracks_p.json
  which needs authentication


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5581 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-06 15:06:19 +00:00
borg-0300
b19bc611b0 gc: better logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5578 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 19:42:32 +00:00
orbiter
b1f9c00118 fix for bug in merge operator initialization
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5577 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 15:26:16 +00:00
orbiter
b57c9da1f8 - fixes to doc, ppt, xls parser: better title
- fixes to httpd server response header generation
- fixes to a server date computation bug
- new Button in indexControl to view content of url in ViewFile


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5576 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 15:15:13 +00:00
f1ori
7936e58fe7 * sorry,previous version didn't compile
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5575 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 12:15:21 +00:00
f1ori
76cdc59789 * added some convertions to and from UTF-8
* this might fix problems on windows systems
  (like http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1824)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-02-05 12:12:07 +00:00
orbiter
94110df85a moved logging partially to kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-31 01:06:56 +00:00
orbiter
024da2916b refactoring of logging
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 23:33:47 +00:00
orbiter
83ce65707a (almost) completed partition of classes in kelondro
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5543 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:44:20 +00:00
orbiter
7ee494fde5 more refactoring of kelondro:
- seperated BLOB from table classes
- renamed 'coding' package to 'order'

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5542 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 22:08:08 +00:00
orbiter
bf93767ec6 refactoring of kelondro database classes
(to be continued)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5540 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 15:33:00 +00:00
orbiter
fc27bf8c4c refactoring of kelondro classes:
kelondro shall become independent from other packages.
moved bytebuffer, date and memory to kelondro

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-30 14:48:11 +00:00
orbiter
6cbca1e508 extended last fix, preventing more sorts
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5533 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-29 16:42:01 +00:00
orbiter
f9672d3f97 applied fix for inefficient put method as recommended by celle, see
http://forum.yacy-websuche.de/viewtopic.php?p=12424#p12424

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5532 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-29 16:08:24 +00:00
orbiter
3154926311 some better memory protection and OOM prevention in EcoFS
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-26 20:29:20 +00:00
orbiter
dedfc7df7f removed distinction between DHT-in and DHT-out. This is necessary to make room for the new cell data structure, which cannot use this this distinction in the first place, but will enable the same meaning with different mechanisms (segments, later)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5511 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-22 00:03:54 +00:00
orbiter
b74159feb8 preparations to integrate the new 'cell' index data structure
(this commit is just to move development files to my other computer, no functionality change so far)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5509 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-21 18:23:37 +00:00
orbiter
cb76d9e0e4 more synchronized in BLOBHeap (will not fix problem with Runtime-Error as reported in forum)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5487 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-13 13:22:29 +00:00
orbiter
f675d47f86 better protection against database failures
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5463 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-10 09:19:46 +00:00
orbiter
4d5b401f00 try to fix some performance problems with the internal index management:
- ensuring that ordered indexes stay ordered during remove
- no unnecessary ordering checks
- better test logic in crawl stacker

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5457 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-09 00:06:36 +00:00
orbiter
c6880ce28b removed the permanent cache flush and replaced it with a periodic cache flush
The cache is now flushed only for one second every ten seconds. During a crawl the cache
fills up completely, and is only flushed if space is needed for more documents.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5446 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-06 13:51:59 +00:00
orbiter
ef7fe537c5 fixed a cache-bug in cachedFileRA
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5445 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-06 10:51:56 +00:00
orbiter
6c7e83909b - refactoring of data access methods to be prepared for new cell data structure
- removed a memory overhead in collections which prevent OOM Exception in low memory configurations

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5443 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-06 09:38:08 +00:00
orbiter
07fc115e90 removed active profiling in kelondroRowSet
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5433 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-02 12:33:06 +00:00
orbiter
be4c458951 refactoring (implemented Iterable in kelondroRowCollection)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5432 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-02 11:38:20 +00:00
orbiter
b6bba18c37 replaced the storing procedure for the index ram cache with a method that generates BLOBHeap-compatible dumps
this is a migration step to support a new method to store the web index, which will also based on the same data structure. made also a lot of refactoring for a better structuring of the BLOBHeap class.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5430 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-01-01 22:31:16 +00:00
orbiter
3567c58b18 added another filed information for BLOBHeap dumps: the gaps
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5420 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-30 10:49:43 +00:00
orbiter
abdd4aa414 added a index dump for blob heaps:
this will increase the shutdown time for at most some seconds, but will speed up the start-up

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5419 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 21:36:27 +00:00
orbiter
8c3205b62e fix for OOB Exception
see http://forum.yacy-websuche.de/viewtopic.php?p=11598#p11598

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5417 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 17:36:53 +00:00
orbiter
e004da48d3 - added fast fingerprint computation for files (any). Will be used in new index dump method
- refactoring

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5415 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-29 12:22:13 +00:00
orbiter
fc8189f3fb better self-healing of corrupted databases
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5406 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-22 16:43:49 +00:00
orbiter
f29b48d9ff patch for IndexOutOfBoundsException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5399 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 22:05:26 +00:00
orbiter
8cb7170b75 - set status of kelondroTree, kelondroBLOBTree and kelondroFlexTable to deprecated
- removed initialization and/or usage of kelondroFlexTable (should meanwhile not be used any more)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5396 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-18 00:08:17 +00:00
orbiter
7cd08bd5fb fix for NPE in BLOBCompressor
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5388 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-11 13:33:24 +00:00
orbiter
5b94498643 fine-tuning of cache usage from SVN 5386 and a bug fix for overflow in available() method
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5387 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 14:35:01 +00:00
orbiter
1779c3c507 - added a read cache to the RAFile interface to RandomAccessFile
- added a write buffer to BLOBHeap
- modified the BLOBBuffer (is now only to buffer non-compressed content)
- added content compression to the HTCache
The new read cache will decrease the start/initialization time of BLOB files,
like the HTCache, RobotsTxt and other BLOBHeap structures.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5386 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-10 11:15:19 +00:00
orbiter
e1acdb952c fix for problem with userDB and bookmarksDB which was caused by changes in kelondroRA in SVN 5376
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5385 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-08 00:17:45 +00:00
orbiter
4a2dac659e more speed hacks:
- modified and activated write buffer
- increased cache flush factor
- fixed a problem with deadlocking of indexing process

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-05 13:55:48 +00:00
orbiter
47292e696a more performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5379 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-04 12:54:16 +00:00
orbiter
759cef23dd fix for bug in kelondroAbstractRA.readFully
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5378 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 23:32:07 +00:00
orbiter
d39d420b39 performance hacks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5376 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-12-03 15:38:29 +00:00
orbiter
513179f404 changed interface to colletctionIndex and adopted all implementing classes:
do not return a result of a double-check when adding entries with addUnique

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5363 6c8d7289-2bf4-0310-a012-ef5d649a1542
2008-11-23 23:55:08 +00:00