Commit Graph

68 Commits

Author SHA1 Message Date
orbiter
df1629b05a - code cleanup
- version 0.471
- moved surftipps to own web page


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2676 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-29 22:27:20 +00:00
orbiter
d374ef2bbe bugfix for tryRemoveURLs
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2605 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-16 00:34:34 +00:00
orbiter
0ee7e45413 bugfix for merge method (caused by bad refactoring)
see http://www.yacy-forum.de/viewtopic.php?p=25529#25529

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2581 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-14 10:30:25 +00:00
orbiter
c89d8142bb replaced old 'kCache' by a full-controlled cache
there are now two full-controlled caches for incoming indexes:
- dhtIn
- dhtOut
during indexing, all indexes that shall not be transported to remote peers
because they belong to the own peer are stored to dhtIn. It is furthermore
ensured that received indexes are not again transmitted to other peers
directly. They may, however be transmitted later if the network grows.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2574 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-14 00:51:02 +00:00
orbiter
6e2907135a bugfixes for remote search server part
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2573 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-13 22:19:34 +00:00
orbiter
75b198bc02 - updated references to indexContainer
- more bugfixes and debugging for indexAbstract processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 11:13:27 +00:00
orbiter
0bed3b9ac3 removed superfluous interface
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2554 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 11:09:51 +00:00
orbiter
4f9e42d5ed more changes towards better join-search
- fixed problems with index-abstract generation
- added analysis output for index abstract receive

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2551 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 00:42:42 +00:00
orbiter
a7281a9b4d fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 11:12:42 +00:00
orbiter
82a6054275 - fixed bug with new indexAbstract generation
- added partly evaluation of indexAbstracts during remote searches

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2544 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 10:39:25 +00:00
orbiter
74d1dea30b changes towards better join-search
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-10 22:36:47 +00:00
orbiter
64bed59ee8 enhancements to ranking
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2535 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-09 23:44:54 +00:00
orbiter
75b03a4580 fix for new ArrayIndexOutOfBoundException
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2524 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 19:15:55 +00:00
orbiter
a8bc768206 enhancements to ranking evaluation
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2523 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 16:04:50 +00:00
orbiter
96c6e4e322 - enhancements to detailed search page
- enhancements to search ranking computation process
- removed bugs in postranking

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2516 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-08 01:26:06 +00:00
orbiter
a2525072f2 bugfix for kelondroRow - property generation
this bug affected ranking parameters :-(

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2506 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 10:55:34 +00:00
orbiter
4866868c0e added write cache for LURLs
This was necessary to speed up the index receive process during global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2498 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-07 01:13:03 +00:00
orbiter
db1eae0227 * simplified initialization of database objects
* replaced kelondroTree for NURLs by kelondroFlex
* replaced kelondroTree for EURLs by kelondroFlex
take care, may be very buggy
please finish crawls before updating. crawls will be lost.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2452 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-24 02:19:25 +00:00
orbiter
23dd972608 fixed memory calculation in performanceMemory web page
fixed also maximum cache size computation

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2429 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-20 01:20:34 +00:00
orbiter
6ad471ef96 * applied many compiler warning recommendations
* cleaned up code
* added unit test code
* migrated ranking RCI computation to kelondroFlex and kelondroCollectionIndex


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2414 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-16 19:49:31 +00:00
orbiter
cd5f7e137c fixed problem with NURL-generation upon first startup
(a new kelondroFlexTable was generated, which should not)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2402 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 23:24:50 +00:00
orbiter
8418af141a added several consistency checks and small changes
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2400 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-12 15:59:14 +00:00
orbiter
130e6d4719 generalized index object for eurl, nurl and lurl to prepare move
of these tables to new kelondroFlexTable Object

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2382 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 17:37:54 +00:00
orbiter
95160d7f2c fixed size computation of index elements from the collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 16:01:18 +00:00
orbiter
abf22f6e60 removed url normalform computation from htmlFilterContentScraper.
This method was implemented in de.anomic.net.URL


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2377 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 15:09:22 +00:00
orbiter
740d49751d * strict type and size check in kelondroRow handling
* adopted all code to use the declaration form of kelondroRow
* fixed a bug in kelondroRow which caused wrong parsing of encoding type
* the bug caused bad database behaviour in new indexCollection data structure.
  because of this bug, all test databases are now already void. A new database is created
* the kelondroFlexTable and indexCollection data structures now store a declaration of the row definition
  into a properties file along the database files.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2375 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 03:20:44 +00:00
orbiter
314021453f * more logging
* option in yacy.init to set useCollectionIndex usage

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2374 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-10 21:21:50 +00:00
orbiter
718fbc2dae enhancements in kelondroCollectionIndex:
* synchronized array and index objects
* auto-fix function for slightly corrupted index entries
* generalized internal access methods

also extended kelondroIndex interface to support ordering access
which is used in kelondroCollectionIndex for string comparisments

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2366 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 23:29:26 +00:00
orbiter
d799622da1 better flush limit for index collections
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2354 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:44:43 +00:00
orbiter
d468d665c9 some changes that may help to prevent deadlocks that cause an OutOfMemoryError
as described in
http://www.yacy-forum.de/viewtopic.php?p=24359

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2353 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-07 00:19:01 +00:00
orbiter
279b1d969d Integrated new indexing data structure 'collections' into the main class
for indexing, the plasmaWordIndex.

The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.

The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.

Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 22:22:14 +00:00
orbiter
4ff742e42d implemented indexCollectionRI
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 19:18:33 +00:00
orbiter
01f95eccd3 re-write of kelondroCollectionIndex. This is the data structure that
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-04 23:04:03 +00:00
orbiter
ebc2233092 * implemented (finished) class indexRowSetContainer
* replaced indexTreeMapContainer by indexRowSetContainer
* deleted indexTreeMapContainer and abstract class
This is another step to the new database structure

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2343 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 23:20:03 +00:00
orbiter
9183d21f25 renamed new index class to old name
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2342 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 20:01:59 +00:00
orbiter
c4e922885a replaced indexURLEntry by new class that uses a kelondroRow.Entry object
to store the index entry. This is another step to move to the new database structure.
A side effect of this change is, that index storage uses much less RAM space,
which affects the index RAM cache.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2341 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 19:59:28 +00:00
orbiter
0b7112f8b2 fix for missing topLevelClone in indexRAMCacheRI.wordContainerIterator
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2340 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-02 00:43:03 +00:00
orbiter
e357599f92 * fixed problem with indexContainer iteration from RAM:
indexContainers from RAM must be cloned explicitely to prevent
  side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
  deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-01 10:30:55 +00:00
orbiter
8b77afd72c some fixes to new container merger
and some code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 22:40:11 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
orbiter
ad692fc6c7 implemented option to extract nurls from the database
(plus some iteration enhancements for nurls)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2325 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:40:59 +00:00
orbiter
7fd90ca7c8 * strict handling of NURL entry element generation, storage and stacking
* more space for EURL reason strings (you must delete the EURL db to use this)


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2324 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-24 16:04:14 +00:00
orbiter
7e0a130fb5 new indexURLEntry class 'indexURLEntryNew', to replace old class
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2321 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:43:36 +00:00
orbiter
58df8b7bbf a large collection of different changes
* mainly for the transition to the new indexing database structure
* a bugfix for an endless loop inside kelondroTree iteration
* a bugfix for bulk read inside a kelondroTree iteration; the bug caused that some elements had been iterated twice
* very strong speed enhancement for url/domain extraction

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2320 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-23 22:39:41 +00:00
orbiter
3879a0ecd0 replaced java.net.URL usage by use of new class de.anomic.net.URL
This shall be seen as an experiment to exclude all cases where
there could be a DNS lookup during URL comparisment.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2290 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-13 01:21:53 +00:00
orbiter
671fd9a5c9 work towards new indexing database structure
(no effect on current functionality yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-04 14:47:27 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
ce9dd3e76d some work in the index construction zone (no effect yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 15:14:54 +00:00
orbiter
e1a52bea22 added a class stub for the new database structure:
a reverse word index based on a a collection index,
which is an index for a set of array files containing
row collections.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 22:24:13 +00:00
orbiter
85d575e928 enhancements to kelondroRow and kelondroColumn
these are changes towards a better indexURLEntry implementation
which are needed for the new database structures

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2268 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 01:26:06 +00:00