Commit Graph

91 Commits

Author SHA1 Message Date
orbiter
c079b18ee7 - refactoring of IntegerHandleIndex and LongHandleIndex: both classes had been merged into the new HandleMap class, which handles (key<byte[]>,n-byte-long) pairs with arbitraty key and value length. This will be useful to get a memory-enhanced/minimized database table indexing.
- added a analysis method that counts bytes that could be saved in case the new HandleMap can be applied in the most efficient way. Look for the log messages beginning with "HeapReader saturation": in most cases we could save about 30% RAM!
- removed the old FlexTable database structure. It was not used any more.
- removed memory statistics in PerformanceMemory about flex tables and node caches (node caches were used by Tree Tables, which are also not used any more)
- add a stub for a steering of navigation functions. That should help to switch off naviagtion computation in cases where it is not demanded by a client

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6034 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-07 21:48:01 +00:00
orbiter
bead0006da replaced tmp file extensions by prt
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6033 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-06 18:09:58 +00:00
orbiter
9bfd22f65d fix for http://forum.yacy-websuche.de/viewtopic.php?p=15523#p15523
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6020 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-04 19:57:25 +00:00
orbiter
cc49aedf12 - fixed problem with remote search NPE
- more abstraction for search requests

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6015 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-03 08:49:54 +00:00
orbiter
c38c852090 modified access method to get index entries out of a array of BLOBs:
iterate them, then merge; not collect them and merge then.
This should use less memory and may behave better in an environment with many queries.
To ensure that too many queries will not cause total blocking,
a time-out of one second was also added. After the time-out
the index data that was collected so far is returned.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6013 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-06-02 16:53:45 +00:00
orbiter
1c69d9b8b6 more refactoring of the index classes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5995 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 14:16:41 +00:00
orbiter
88426912ad more refactoring to make the segment object easier to use and to be prepared to integrate author navigation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5992 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-29 10:03:35 +00:00
lotus
d813fd26ed reset sent/received counters on index delete
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5991 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 15:49:42 +00:00
orbiter
99bf0b8e41 refactoring of plasmaWordIndex:
divided that class into three parts:
- the peers object is now hosted by the plasmaSwitchboard
- the crawler elements are now in a new class, crawler.CrawlerSwitchboard
- the index elements are core of the new segment data structure, which is a bundle of different indexes for the full text and (in the future) navigation indexes and the metadata store. The new class is now in kelondro.text.Segment

The refactoring is inspired by the roadmap to create index segments, the option to host different indexes on one peer.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5990 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-28 14:26:05 +00:00
orbiter
fec6f9054f some refactoring of search methods
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5988 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-27 23:51:34 +00:00
orbiter
26a46b5521 increased default maximum file size for database files to 2GB
Other file sizes can now be configured with the attributes
filesize.max.win and filesize.max.other
the default maximum file size for non-windows OS is now 32GB

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5974 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-25 06:59:21 +00:00
orbiter
e005cfea37 fix for bug in -incell option of URLAnalysis
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5967 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-21 06:57:03 +00:00
orbiter
a7e392f31b The collection index will not be supported any more.
Existing indexes based on the old index collections must be migrated with YaCy 0.8
- removed index collection classes and all migration tools
- added a 'incell' reference collection feature in URL analysis


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5966 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-20 14:51:26 +00:00
orbiter
a2f48863fc - added prototype for navigation index
- refactoring of word index prototype
(no functional changes so far)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5965 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-20 09:00:24 +00:00
orbiter
f133d6065c fix for http://forum.yacy-websuche.de/viewtopic.php?p=14955#p14955
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5958 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-17 18:28:33 +00:00
orbiter
c1e5fad9a7 fix for http://forum.yacy-websuche.de/viewtopic.php?p=14767#p14767
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5944 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 20:50:46 +00:00
orbiter
8ee3a94e82 fix for non-caching of sitehash, see http://forum.yacy-websuche.de/viewtopic.php?p=14440#p14440
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5942 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-10 11:44:17 +00:00
low012
d164b42604 *) cosmetics
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5934 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 19:26:36 +00:00
orbiter
5fb77116c6 added a submenu to index administration to import a wikimedia dump (i.e. a dump from wikipedia) into the YaCy index: see
http://localhost:8080/IndexImportWikimedia_p.html

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5930 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-08 07:54:10 +00:00
hermens
df733af4fa Try not to loose content from ram during IndexCell.delete by moving ram.delete after the dangerous operations on the array (array.get and array.delete)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5929 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-06 12:53:17 +00:00
hermens
ac72005f2f Let IndexCell.remove remove entries from the ram portion of the DB as well.
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5928 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-06 12:32:34 +00:00
orbiter
8ba7ff5353 a fix and another speed enhancement for the RWI cache
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5927 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 22:40:40 +00:00
orbiter
e6773cbb33 better handling of RWI cache for concurrency and less overhead when writing new entries -> even more indexing speed
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5924 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 20:08:23 +00:00
orbiter
c097531e3d added a catch Exception to all thread to check if any of them silently dies without any other notification
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5922 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-05 06:31:35 +00:00
orbiter
083533e5ec fix for bugs in IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5921 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 21:37:59 +00:00
orbiter
21fbca0410 better scaling of HEAP dump writer for small memory configurations;
should prevent OOMs during cache dumps

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5920 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-04 08:29:44 +00:00
orbiter
6e0b57284d better care for states of the IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5919 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-03 22:54:47 +00:00
orbiter
1db9cdd4e4 fixed bug in writing of robots.txt entries in case that host names exceeded 64 characters and some other problems
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5918 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-03 19:35:10 +00:00
orbiter
d2ac0aa682 - fixed possible bugs in Stack (may affect Crawler reset) and RandomAccess handling
- increased default memory size to 180MB
- fixed possible bug in http client reset (there was a deadlock)
- bug in BOBHeap marked, but not solved, cause is still unknown.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-05-02 01:40:03 +00:00
orbiter
8d6212233b fix for IODispatcher
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5896 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-28 07:24:28 +00:00
orbiter
07f09742bb set of small fixes and comments
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5893 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-27 15:29:50 +00:00
orbiter
9e4db75aac reduced internal logging and reduced memory that internal logging can use
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5867 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 12:09:04 +00:00
orbiter
c10c257255 attempt to fix a deadlock situation where the IODispatcher did not work.
I suspect the dispatcher thread has crashed and queues filled so no indexing process was able to write data.
This fix tries to heal the problem, but I am unsure if it helps. To get a better view of the problem, some more log outputs had been inserted.
Added also a new attribut indexer.threads to get a control over the number of default threads for the indexer (default is 1)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5866 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-24 11:55:39 +00:00
orbiter
fe51f4d668 less synchronization may help to prevent deadlocks
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5863 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-23 20:54:13 +00:00
orbiter
138422990a - removed useCell option: the indexCell data structure is now the default index structure; old collection data is still migrated
- added some debugging output to balancer to find a bug
- removed unused classes for index collection handling
- changed some default values for the process handling: more memory needed to prevent OOM

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5856 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-22 22:39:12 +00:00
orbiter
5195c94838 two patches for performance enhancements of the index handover process from documents to the index cache:
- one word prototype is generated for each document, that is re-used when a specific word is stored.
- the index cache uses now ByteArray objects to reference to the RWI instead of byte[]. This enhances access to the the map that stores the cache. To dump the cache to the FS, the content must be sorted, but sorting takes less time than maintenance of a sorted map during caching.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5849 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 14:23:04 +00:00
orbiter
dfb96ecb72 more fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5844 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 22:08:38 +00:00
orbiter
1b8d346b4c fixes in connection with transiton to byte[] hashes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5843 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 21:54:00 +00:00
orbiter
996572de95 quickfix
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5841 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 16:11:35 +00:00
orbiter
380ed2dac0 performance and debugging additions
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5840 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-20 15:01:43 +00:00
f1ori
76af84d732 * add custom comparator to ScoreCluster for byte[]
* fixes http://forum.yacy-websuche.de/viewtopic.php?f=6&t=2010


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5836 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-19 20:01:46 +00:00
f1ori
2f860a2564 * convert byte[] hashes to string for log output
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5830 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-18 14:35:18 +00:00
orbiter
63cd152969 fixes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5818 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 22:18:35 +00:00
orbiter
c8624903c6 full redesign of index access data model:
terms (words) are not any more retrieved by their word hash string, but by a byte[] containing the word hash.
this has strong advantages when RWIs are sorted in the ReferenceContainer Cache and compared with the sun.java TreeMap method, which needed getBytes() and new String() transformations before.
Many thousands of such conversions are now omitted every second, which increases the indexing speed by a factor of two.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5812 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-16 15:29:00 +00:00
orbiter
8a24350036 - fix for join method with new generalized RWI data structure (caused by latest commit)
- added more functions to mediawiki parser


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5806 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 10:26:24 +00:00
orbiter
89ec3acb3e - full abstraction of index content type: the kelondro full text index may now also contain indexes about other content than text, i.e. navigation indexes or reverse linking indexes.
- during index joins all word positions are maintained: better ranking for word distance possible; exact phrase match can be implemented soundly


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5804 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-15 06:34:27 +00:00
orbiter
de68948bc5 better handling of free memory computation and emrgency cache flush for index cell
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5798 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-12 09:24:32 +00:00
orbiter
b81c7467d8 protection against too many files in RICELL in case of massive emergency dumps caused by low memory
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5791 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-09 23:55:47 +00:00
orbiter
44e01afa5b - refactoring
- a little bit more abstraction
- new interfaces for index abstraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5783 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-07 09:34:41 +00:00
orbiter
82fb60a720 increased memory limit for emergency cache flush
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5782 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-06 15:54:19 +00:00