yacy_search_server/source/de/anomic/http
orbiter 16baa7ad24 To translate a mediawiki dump into the YaCy surrogate format do the following:
- download a wikipedia dump, i.e. dewiki-20090311-pages-articles.xml.bz2
from http://download.wikimedia.org/dewiki/20090311/
- move dewiki-20090311-pages-articles.xml.bz2 to DATA/HTCACHE/
- start the conversion; open a command shell, move to the yacy home directory and execute
java -Xmx2000m -cp classes:lib/bzip2.jar de.anomic.tools.mediawikiIndex -convert DATA/HTCACHE/dewiki-20090311-pages-articles.xml.bz2 DATA/SURROGATES/in/ http://de.wikipedia.org/wiki/

this generates a series of files to DATA/SURROGATES/in

if YaCy is running (it may run concurrently), it fetches all new dumps in the surrogate-in directory. The export process is transaction-save, that means YaCy will not start reading a dump while the dump is not completely finished.


git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5851 6c8d7289-2bf4-0310-a012-ef5d649a1542
2009-04-21 22:12:19 +00:00
..
AcceptEverythingSSLProtcolSocketFactory.java Accept all SSL-certificates (not only valid and self-signed), but put a warning into log file 2008-06-05 15:21:43 +00:00
AcceptEverythingTrustManager.java refactoring of logging 2009-01-30 23:33:47 +00:00
DefaultCharsetFilePart.java * refactoring 2008-08-02 13:57:00 +00:00
DefaultCharsetStringPart.java * refactoring 2008-08-02 13:57:00 +00:00
EasySSLProtocolSocketFactory.java * removed some warnings of findbugs (http://findbugs.sf.net) 2008-08-06 19:43:12 +00:00
EasyX509TrustManager.java added final where possible 2008-08-02 12:12:04 +00:00
httpChunkedInputStream.java (almost) completed partition of classes in kelondro 2009-01-30 22:44:20 +00:00
httpChunkedOutputStream.java moved logging partially to kelondro 2009-01-31 01:06:56 +00:00
httpClient.java some fixes and performance hacks 2009-04-20 23:01:44 +00:00
HttpConnectionInfo.java refactoring of logging 2009-01-30 23:33:47 +00:00
httpd.java static TMPDIR 2009-03-12 16:23:12 +00:00
httpdAlternativeDomainNames.java major step forward to network switching (target is easy switch to intranet or other networks .. and back) 2008-05-05 23:13:47 +00:00
httpdBoundedSizeOutputStream.java added final where possible 2008-08-02 12:12:04 +00:00
httpdByteCountInputStream.java * removed some warnings of findbugs (http://findbugs.sf.net) 2008-08-06 19:43:12 +00:00
httpdByteCountOutputStream.java * removed some warnings of findbugs (http://findbugs.sf.net) 2008-08-06 19:43:12 +00:00
httpdFileHandler.java To translate a mediawiki dump into the YaCy surrogate format do the following: 2009-04-21 22:12:19 +00:00
httpdLimitExceededException.java added final where possible 2008-08-02 12:12:04 +00:00
httpdProxyCacheEntry.java refactoring: better abstraction of reference and metadata prototypes. 2009-04-03 13:23:45 +00:00
httpdProxyHandler.java refactoring: better abstraction of reference and metadata prototypes. 2009-04-03 13:23:45 +00:00
httpdRobotsTxtConfig.java more performance hacks 2008-12-04 12:54:16 +00:00
httpHeader.java simplification of (internal) query process / refactoring 2009-03-06 15:53:20 +00:00
httpRemoteProxyConfig.java - fixed "yacy2yacy no proxy"-problem 2008-08-17 10:16:32 +00:00
httpRequestHeader.java simplification of (internal) query process / refactoring 2009-03-06 15:53:20 +00:00
httpResponse.java - refactoring of the http client 2009-02-19 16:24:46 +00:00
httpResponseHeader.java - fixes to doc, ppt, xls parser: better title 2009-02-05 15:15:13 +00:00
httpSSI.java (almost) completed partition of classes in kelondro 2009-01-30 22:44:20 +00:00
httpTemplate.java moved logging partially to kelondro 2009-01-31 01:06:56 +00:00
MultiOutputStream.java partial fix (images,audio,video) for proxy and content-type problem http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1374 2008-08-26 16:34:24 +00:00
ProxyLogFormatter.java refactoring of logging 2009-01-30 23:33:47 +00:00