yacy_search_server/source/de/anomic/tools
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
..
bbCode.java * removed some warnings of findbugs (http://findbugs.sf.net) 2008-08-06 19:43:12 +00:00
bitfield.java more performance hacks 2008-12-04 12:54:16 +00:00
consoleInterface.java * some refactoring/moves to consoleInterface 2009-02-07 11:53:48 +00:00
crypt.java more refactoring of kelondro: 2009-01-30 22:08:08 +00:00
cryptbig.java more refactoring of kelondro: 2009-01-30 22:08:08 +00:00
CryptoLib.java * ignore whitespaces so you can copy&paste signatures better 2009-04-17 14:52:42 +00:00
diskUsage.java more bugfixes as recommendet by findbugs 2009-02-17 09:12:47 +00:00
disorderHeap.java added final where possible 2008-08-02 12:12:04 +00:00
disorderSet.java added final where possible 2008-08-02 12:12:04 +00:00
enumerateFiles.java - added migration class to go from index collections to the index cell data structure. 2009-03-30 15:31:25 +00:00
Formatter.java moved logging partially to kelondro 2009-01-31 01:06:56 +00:00
gzip.java refactoring of logging 2009-01-30 23:33:47 +00:00
iso639.java use accept-language header instead of user agent for language detection 2008-10-01 17:47:11 +00:00
ListDirs.java added final where possible 2008-08-02 12:12:04 +00:00
loaderCore.java - removed superfluous copyright statement 2008-07-20 17:14:51 +00:00
loaderProcess.java - removed superfluous copyright statement 2008-07-20 17:14:51 +00:00
loaderThreads.java - refactoring of the http client 2009-02-19 16:24:46 +00:00
mediawikiIndex.java To translate a mediawiki dump into the YaCy surrogate format do the following: 2009-04-21 22:12:19 +00:00
nxTools.java moved logging partially to kelondro 2009-01-31 01:06:56 +00:00
PKCS12Tool.java added final where possible 2008-08-02 12:12:04 +00:00
Punycode.java more performance hacks 2008-12-04 12:54:16 +00:00
SignatureOutputStream.java * introduce signatures to autoupdate 2009-04-17 09:58:06 +00:00
tarTools.java refactoring of logging 2009-01-30 23:33:47 +00:00