Commit Graph

12328 Commits

Author SHA1 Message Date
reger
567467869e upd to httpclient-4.5.2 2016-03-12 02:00:23 +01:00
reger
a57226caa6 put settings_p servlet (back) as start page for System Administration
to display the options available (as tables_p only used for indepth edit)
see http://mantis.tokeek.de/view.php?id=460
2016-03-12 01:15:05 +01:00
Michael Peter Christen
f4591b1b51 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2016-03-11 18:12:38 +01:00
Michael Peter Christen
b89465d952 0N - basic dump upload servlet infrastructure, to share index dumps
within an experimental new sharing model
2016-03-11 18:12:13 +01:00
Michael Peter Christen
f12a900f3e harmonization of http post of files for one and several files - this had
been differently - and wrong for several files. also: base64-encoding
for gzipped push files because our data structures currently only
supports ASCII POST pushes..
2016-03-11 08:59:33 +01:00
Michael Peter Christen
1ce38fdaed 0n - added experimental zeronet network which supports intranet peers
(still needs work)
2016-03-11 08:55:51 +01:00
Michael Peter Christen
849ab671a9 0n: modified the p2p bootstraping process - rules had been too tight and
did not support the re-start of a network with just one principal peer.
2016-03-11 08:54:42 +01:00
Michael Peter Christen
d05ffa1c51 update to seed list 2016-03-11 07:20:38 +01:00
reger
16724c1283 remove unused proxyCookieWhiteList from yacy.init 2016-03-11 01:14:54 +01:00
reger
51e7151591 more fixes for sevenZip parser exception on specific archives
Found 2 more cases were modified code throws exceptions while original J7Zip
unpacks it.
Reimplemented error prone CoderMixer2ST from unmodified J7Zip.jar.
2016-03-09 01:01:58 +01:00
reger
826e6e5894 remove transitive dependency guava from pom
(let Maven do it's job)
2016-03-09 00:56:34 +01:00
reger
5b07f3473e fix sevenZip parser exception with LZMA BCJ2 format archives,
see http://mantis.tokeek.de/view.php?id=641
+ sanitize some type cast's in modified sevenZip lib
2016-03-08 00:53:38 +01:00
reger
c2a88d53ab fix use of not initialized variable (m_LiteralDecoder) 2016-03-06 03:24:09 +01:00
reger
96b8d9b09e moving the J7Zip-modified source and Maven build to libbuild
from main pom. 
Using source included in j7zip-modified.jar.
This combines all external lib preparation in the libbuild main pom.
2016-03-06 03:19:52 +01:00
reger
764f5100f0 fix delete of temp file after odt % ooxml parser
Close zipfile after parsing
2016-03-04 23:05:55 +01:00
reger
379e9b330d use supplied url port to get robots.txt in crawlers hostqueue 2016-03-02 00:12:34 +01:00
reger
ed765de29b adjust start/stop classpath in build script
(with servlet classloader no need for htroot in system classpath)
2016-02-29 00:04:36 +01:00
reger
9a7efa7814 harmonize classpath with startYaCy.bat
(with servlet classloader no need for htroot in system classpath)
2016-02-28 22:53:41 +01:00
reger
0dcda3809e harmonize classpath with startYaCy.bat 2016-02-28 22:10:43 +01:00
reger
58a959403d fix mixed logfactory in UrlProxyServlet,
Class doesn't use functions of declared ancestor, change to extend on httpservlet
2016-02-27 03:44:43 +01:00
reger
dc112d0e32 upd to slf4j-1.7.16 2016-02-26 00:50:26 +01:00
Michael Peter Christen
2494a820c7 0N - added recording of dump exports if given time frame is not negative 2016-02-24 15:13:20 +01:00
Michael Peter Christen
ef2cc4f690 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2016-02-24 11:19:32 +01:00
reger
7b02cacb12 upd to Jetty 9.2.15.v20160210 2016-02-24 02:32:12 +01:00
Michael Peter Christen
a6bf0b1649 0N - added option to generate index export files for a specific number
of minutes in the past and reverted latest change. The export file dump
will now contain four data elements: f - first date of index entry write
date, l - last date of index write date, n - now-date of index dump
time, c - count of numbers inside the dump. '0N' denotes a series of
changes which will lead to the opportunity to exchange index data dumps
in a way that is needed to integrate ZeroNet index data. This will be
based on index dump sharing; that causes this commit.
2016-02-23 18:56:20 +01:00
reger
9312fbe563 making WebStructurePicture_p less vulnerable to faulty host input parameter (like host1,,host3)
by continue host loop on exception

inspired by http://mantis.tokeek.de/view.php?id=637
2016-02-21 21:38:11 +01:00
reger
6d56beaed8 fix assertion exception in toString of MultiProtocolURL
toString of AnchorURL and MultiProtocolURL are identical code
(no need to override or to protect call to parent)

as reported in https://github.com/yacy/yacy_search_server/issues/43
2016-02-21 00:23:00 +01:00
reger
b12b8fb1c2 include initial japaneese translation to language selection 2016-02-20 23:17:59 +01:00
Burkhard
6a3d27ca5b Merge pull request #44 from ImpactCrater/master
Created a translation file ja.lng
2016-02-20 22:43:41 +01:00
reger
42a7bdb2af fix SolrSelectServlet authentication to default to true 2016-02-20 22:30:15 +01:00
ImpactCrater
567c292302 Created a translation file ja.lng
I wrote a bit of translation to Japanese.
2016-02-21 03:55:33 +09:00
Michael Peter Christen
5b9030180c added peer hash to export dump name. 2016-02-19 19:26:02 +01:00
Michael Peter Christen
287b918bd7 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2016-02-19 07:52:59 +01:00
reger
20e3c25ae3 upd to weupnp-0.1.4.jar 2016-02-18 01:09:29 +01:00
reger
dbb28bb4f3 del unused statistic parameter (from status servlet) 2016-02-17 22:47:03 +01:00
Michael Peter Christen
b851308ee6 enhanced robustnes of image computation 2016-02-16 17:36:49 +01:00
reger
06d0e2aeb9 result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode.
- Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).
2016-02-16 02:05:58 +01:00
reger
caf9e98f09 put metadata dc_publisher in corresponding schema field 2016-02-14 21:13:25 +01:00
reger
38e2b054d4 remove servlet classloder internal cache map (to save the resources, cache hits marginal)
- DefaultServlet includes already a class cache "templateMethodCache" which is emptied 
  on low mem status 
- avoid classloader cache gets has no hits but over time holds all (used) servlet classes
2016-02-12 01:20:03 +01:00
reger
6f0b073bf3 override detected language (statistic langdetect) only with TLD determided
language if langdetect probability is not high.
+ additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh
used by YaCy
2016-02-07 21:16:22 +01:00
reger
b65e2b527d include use of condenser's content text for language detection.
Language identification may show poor performance on documents with short or no
title but clear lang indication in text content. Using content text too
improves lang detection.
+ remove double caching of text in Identificator
2016-02-07 01:52:32 +01:00
reger
756c55e6d1 upd to Solr 5.4.1 2016-02-06 21:32:54 +01:00
reger
937fbb0b9f correct isHidden() for smb from last commit 2016-02-04 19:20:27 +01:00
reger
535d4bf75f respect hidden attribute for file and smb directory listing
(hidden directories are not listed, effects crawling of local file system)
2016-02-04 19:16:00 +01:00
reger
cc79ad8de6 compare search page, remove diminished search target
(romso.de, dbpedia.neofonie.de )
2016-02-04 00:47:42 +01:00
reger
375d49d536 upd classpath in batches (remove not necessary htroot)
see prev commit
2016-02-03 21:50:50 +01:00
reger
c28142095a add findClass() to servlet class loader (used in YaCyDefaltServlet)
In the 2 cases where servlet calls servlet the jvm classloader chain is
invoked and servlet class loaded by jvm loader (successful while requiring 
htroot in system classpath). This patch uses the standard override design
for loaders to handle these cases (making in not longer crucial to have htroot 
in system classpath, as this classLoader is mainly used for servlets and
looks in this case for the class in the configured path).
+ As the default classloader is parallelcapable we should register this too.
2016-02-02 03:44:01 +01:00
reger
8e60788c8f fix json date facet displayname 2016-01-31 02:38:39 +01:00
reger
46772e08d0 upd to pdfbox 1.8.11 2016-01-31 00:30:39 +01:00
reger
a6617ad887 expand initRemoteCrawler() to terminate worker threads if called to deactivate
remote crawl.
On startup we save the resources for remote crawler if disabled. Once started
threads are running idle after disable remote crawl. Now threads are terminated
to save the resources also while disabeling during runtime.
+ remove empty class Channels
2016-01-28 23:14:09 +01:00