yacy_search_server/source/de/anomic/kelondro/index
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
..
BinSearch.java
Column.java
HandleSet.java
IndexTest.java two patches for performance enhancements of the index handover process from documents to the index cache: 2009-04-21 14:23:04 +00:00
IntegerHandleIndex.java full redesign of index access data model: 2009-04-16 15:29:00 +00:00
LongHandleIndex.java - added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up. 2009-04-01 12:39:11 +00:00
ObjectArray.java
ObjectArrayCache.java
ObjectIndex.java a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second. 2009-04-02 15:08:56 +00:00
ObjectIndexCache.java fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378 2009-03-18 10:29:13 +00:00
Row.java another performance hack 2009-03-18 22:33:36 +00:00
RowCollection.java To translate a mediawiki dump into the YaCy surrogate format do the following: 2009-04-21 22:12:19 +00:00
RowSet.java To translate a mediawiki dump into the YaCy surrogate format do the following: 2009-04-21 22:12:19 +00:00
RowSetArray.java
SimpleARC.java bugfixes and performance hacks 2009-04-17 13:04:56 +00:00