Commit Graph

19 Commits

Author SHA1 Message Date
orbiter
109ed0a0bb - cleaned up code; removed methods to write the old data structures
- added an assortment importer. the old database structures can
  be imported with
  java -classpath classes yacy -migrateassortments
- modified wordmigration. The indexes from WORDS are now imported
  to the collection database. The call is
  java -classpath classes yacy -migratewords
  (as it was)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@3044 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-12-05 02:47:51 +00:00
orbiter
e3d75f42bd final version of collection entry type definition
- the test phase of the new collection data structure is finished
- test data that had been generated is void. There will be no migration
- the new collection files are located in DATA/INDEX/PUBLIC/TEXT/RICOLLECTION
- the index dump is void. There will be no migration
- the new index dump is in DATA/INDEX/PUBLIC/TEXT/RICACHE

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2983 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-19 20:05:25 +00:00
orbiter
bb7d4b5d5e refactoring to prepare new RWI entry object
- moved all url and index(RWI) entries to index package
- better naming to distinguish RWI entries and URL entries


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2937 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-08 16:17:47 +00:00
orbiter
8fdefd5c68 generalization of payload definition of index storage
this is one step forward to the migration to a new collection data format

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2912 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-11-05 02:10:40 +00:00
orbiter
78b7f6f7fd bugfix for index remove bug,
appeared after search where snippet-loading triggered word removal

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2869 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-10-28 00:22:10 +00:00
orbiter
75b198bc02 - updated references to indexContainer
- more bugfixes and debugging for indexAbstract processing

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2555 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-12 11:13:27 +00:00
orbiter
a7281a9b4d fix for last commit
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2545 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-11 11:12:42 +00:00
orbiter
74d1dea30b changes towards better join-search
- added generation of a compressed index within remote peers during global search
- added selection of specific urls within remote peers during secondary global search


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2539 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-09-10 22:36:47 +00:00
orbiter
95160d7f2c fixed size computation of index elements from the collection index
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2380 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-11 16:01:18 +00:00
orbiter
279b1d969d Integrated new indexing data structure 'collections' into the main class
for indexing, the plasmaWordIndex.

The new data structure is ready-to-use, but currently disabled.
It can be activated by setting the static
plasmaWordIndex.useCollectionIndex
to true. This shall be done for testing purpose.

The new index is stored to
DATA/INDEX/PUBLIC/TEXT
The directory PLASMA shall be used only for crawler in the future.

Attention: during testing the data structure in INDEX may change,
and created indexes with the new data structure may get useless.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2348 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 22:22:14 +00:00
orbiter
4ff742e42d implemented indexCollectionRI
this is the new database structure that is supposed to replace the
plasmaAssortmentCluster AND the plasmaWordIndexFileCluster
The new structure is not yet active and needs to be integrated into
plasmaWordIndex. This has some migration constraints that are not yet
completely solved.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2347 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-05 19:18:33 +00:00
orbiter
01f95eccd3 re-write of kelondroCollectionIndex. This is the data structure that
shall replace the current assortment files.
* used the kelondroFlexTable to hold the index of collections
* used kelondroRow definitions to declare all data structures
* fixed several bugs that appeared in kelondroRowSet and kelondroRowCollection during testing


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2344 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-04 23:04:03 +00:00
orbiter
e357599f92 * fixed problem with indexContainer iteration from RAM:
indexContainers from RAM must be cloned explicitely to prevent
  side-effects on stored indexContainer objects in Cache
* changed behaviour of urlReference deletion from indexContainers:
  deletion does not user retrieval of all Elements from the assortments
* added textual configuration of kelondroRow and kelondroColumn definition
* update of kelondroRow usage in yacyNews
* modified kelondroAttrSeq to use modified kelondroColumn parser

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2339 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-08-01 10:30:55 +00:00
orbiter
8b77afd72c some fixes to new container merger
and some code cleanup

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2336 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 22:40:11 +00:00
orbiter
417ed5102e redesign of database iterators:
an iteration of key elements in kelondroTree databases is no longer supported.
this is now replaced by an iteration of kelondroRow.Entry objects from the database
Iteration of keys from the database was mostly followed by retrieval of the row
from the database, whcih caused unnecessary database load.
The index selection was also redesigned to use the new row iteration methods.
This affects many funktions, most important is the DHT selection routine which is now much faster.



git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2327 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-26 11:21:51 +00:00
orbiter
671fd9a5c9 work towards new indexing database structure
(no effect on current functionality yet)

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2277 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-04 14:47:27 +00:00
orbiter
92f4cb4d73 added option to configure the start-up delay time for kelondro database files.
the start-up delay is used to pre-load the database node cache

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2276 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 23:57:33 +00:00
orbiter
ce9dd3e76d some work in the index construction zone (no effect yet)
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2275 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-03 15:14:54 +00:00
orbiter
e1a52bea22 added a class stub for the new database structure:
a reverse word index based on a a collection index,
which is an index for a set of array files containing
row collections.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@2271 6c8d7289-2bf4-0310-a012-ef5d649a1542
2006-07-02 22:24:13 +00:00