Commit Graph

2641 Commits

Author SHA1 Message Date
Michael Peter Christen
907db8b7a6 fix for bad query shortcut hack 2014-02-25 15:19:04 +01:00
Michael Peter Christen
a2b66fe2eb Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-25 14:37:39 +01:00
Michael Peter Christen
9f6be762a6 - better logging for postprocessing
- fixed collection bug in postprocessing
2014-02-25 14:37:30 +01:00
orbiter
da5d4128bf prevent npe 2014-02-25 03:26:20 +01:00
orbiter
a878c7982c prevent npe 2014-02-25 03:19:41 +01:00
orbiter
e4eb87d924 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-25 02:16:37 +01:00
orbiter
ced1a96f9c fixed error cache 2014-02-25 02:16:22 +01:00
reger
3ba81bd08a Merge origin/master 2014-02-25 00:24:10 +01:00
reger
4d896383db fix: use timeout = proxy.ClientTimeout in ProxyHandler
(was 10sec fix) see http://bugs.yacy.net/view.php?id=236
2014-02-25 00:23:06 +01:00
orbiter
cfb647db6e - introduced a miss cache in ConcurrentUpdateSolrConnector
- better usage of cache
- bugfix for postprocessing
2014-02-24 23:42:50 +01:00
orbiter
a87d8e4a8e changed caching of ConcurrentUpdateSolrConnector: it caches now also the
url along with the load date. While this takes much more memory, it
eliminates database lookups for getURL() requests, which happen equally
often. This speeds up remote solr configurations.
2014-02-24 22:59:58 +01:00
orbiter
f6e441dd77 refactoring 2014-02-24 21:01:56 +01:00
orbiter
76c53faeb2 removed unused code (HostStat) 2014-02-24 20:51:43 +01:00
orbiter
d3a88eaecb introducing ConcurrentUpdateSolrServer for remote solr servers.
Scaling of write buffers and update queue size is made according to
assigned memory.
2014-02-24 20:26:02 +01:00
reger
809e976578 remove unused java imports form yacy.java 2014-02-24 05:19:40 +01:00
reger
a9b06f8719 add a -config command line parameter e.g. -config "port=9090" "port.ssl=8043"
- useful for remote installation to set any config file property
- multipe parameter can be set at once, on Windows enclose parameter in doublequotes
- special handling   "adminAccount=adminuser:adminpwd"  sets adminusername and md5 encoded admin-pwd

- adjusted windows startbatch to allow command line parameter handling
- remove not needed classpath calculation from startYACY_debug.bat
2014-02-24 05:16:31 +01:00
reger
0923b09216 fix: allow 4 character admin user name
(was min 5 char)
2014-02-24 00:01:11 +01:00
Michael Peter Christen
254a7ac66c fixed cleaning of index 2014-02-22 01:35:01 +01:00
Michael Peter Christen
28a7b42e6b removed warning "sun.misc.BASE64Encoder is internal proprietary API and
may be removed in a future release"
2014-02-22 00:52:49 +01:00
Michael Peter Christen
046f5a03cb one more SolrIndexSearcher bugfix 2014-02-21 23:48:56 +01:00
sixcooler
78c01b3eff fix for 'AlreadyClosedException: this IndexReader is closed' 2014-02-21 17:28:32 +01:00
Michael Peter Christen
1b5e3d523a better control over close-state of remote solr connections 2014-02-20 00:39:19 +01:00
Michael Peter Christen
1a364572a5 fix for
"org.apache.solr.core.SolrCore Too many close [count:-1] on
org.apache.solr.core.SolrCore@51af7c57"
-error
2014-02-20 00:03:35 +01:00
Michael Peter Christen
69391e5d9e changed strategy to test existence of documents in Solr: using the
update time. The reason for that is a better caching for the crawler
double-check, which needs the update time for crawler steering.
2014-02-19 04:03:45 +01:00
Michael Peter Christen
790f103f32 delete fail-docs during postprocessing to prevent that they will appear
again and stay in postprocessing forever.
2014-02-18 01:38:56 +01:00
Michael Peter Christen
ff656ce860 explicit call to optimize to add a expungeDeleted flag 2014-02-12 01:01:23 +01:00
Michael Peter Christen
9eb668e951 enhanced the resource observer
The resource observer is now able to recognize free disk space AND
available space for YaCy. The amount of space which is assigned for YaCy
are defined in new settings in the configuration file.
Furthermore, there is now a cleanup process which deletes files in case
that an autodelete is activated. The autodelete is now BY DEFAULT ON if
the disk space is low, which means that YaCy starts to delete documents
when the disk is full!
2014-02-12 01:00:44 +01:00
Michael Peter Christen
fbee98c06f fixed shortcut self-reference bug 2014-02-11 22:14:46 +01:00
Michael Peter Christen
e7a29a2851 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-11 22:03:46 +01:00
Michael Peter Christen
bf97e38b83 removed clearURLIndex, which is a stub remaining from the old metadata
database and not needed any more
2014-02-11 22:01:25 +01:00
orbiter
14764632b5 clear solr caches in case that an exception occurrs. The reason behind
this hack is the occurrence of Exceptions like:
W 2014/02/11 18:51:33 ConcurrentLog GC overhead limit exceeded
java.io.IOException: GC overhead limit exceeded
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:334)
        at
net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getDocumentById(MirrorSolrConnector.java:173)
        at
net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getDocumentById(ConcurrentUpdateSolrConnector.java:415)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:331)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:317)
        at
net.yacy.search.query.SearchEvent.pullOneRWI(SearchEvent.java:1024)
        at
net.yacy.search.query.SearchEvent.pullOneFilteredFromRWI(SearchEvent.java:1047)
        at
net.yacy.search.query.SearchEvent$3.run(SearchEvent.java:1263)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3077)
        at java.lang.StringCoding.decode(StringCoding.java:196)
        at java.lang.String.<init>(String.java:491)
        at java.lang.String.<init>(String.java:547)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
        at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
        at
org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
        at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:657)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.SolrQueryResponse2SolrDocumentList(EmbeddedSolrConnector.java:230)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getDocumentListByParams(EmbeddedSolrConnector.java:320)
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:330)
        ... 7 more
        
This problem was analysed with the Eclipse Memory Analyser after a heap
dump, where the following problem was reported as the main Problem
Suspect:

One instance of "org.apache.solr.util.ConcurrentLRUCache" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x42e940a0" occupies 902.898.256
(61,80%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system
class loader>".

This memory is part of the result cache of Solr. Flushing this cache
appears the most appropriate solution to that problem.
2014-02-11 20:56:40 +01:00
Michael Peter Christen
bc28247089 Added methods in resource observer to calculate the available and the
occupied disc space. These values are also shown on the status page.
The disc space calculation shall be used for a disk-limitation of the
search index.
2014-02-11 03:20:03 +01:00
Michael Peter Christen
0dda979801 adopted network image drawing to increased number of peers 2014-02-11 00:53:10 +01:00
Michael Peter Christen
ca8b100f96 run the cleanup process even when load is high, do postprocessing even
if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB
RAM available). The memory amount of the postprocessing is the cause
that systems block because they run into a frequent-GC chain which
almost locks the peer. If running with enough memory, the postprocessing
is fast and not damaging to the system.
Because the required RAM of 0.5 GB is never available in default
setting, the postprocessing will not run if the peer is not reconfigured
to use more memory.
2014-02-10 12:59:30 +01:00
Michael Peter Christen
195e5868d3 catch solr close exceptions 2014-02-09 15:04:46 +01:00
Michael Peter Christen
751c128544 extra sleep for remote searches enhances search results because there is
more time for more remote peers to contribute on the first result page
2014-02-09 14:57:17 +01:00
Michael Peter Christen
0cabcbbe83 more efficient wordcount 2014-02-09 14:45:12 +01:00
Michael Peter Christen
3d474a843e added memory protection for postprocessing 2014-02-09 12:36:56 +01:00
Michael Peter Christen
412d55523c enhanced memory protection and OOM exception handling in Solr connector 2014-02-09 12:36:14 +01:00
Michael Peter Christen
d9858e1b8a removed warnings and superfluous logging 2014-02-09 12:26:58 +01:00
Michael Peter Christen
acc8d7faa7 fixed setting of shortMemoryStatus in MemoryControl 2014-02-09 12:25:55 +01:00
Michael Peter Christen
94245ce0a8 fixed "Size in KBytes" calculation in PerformanceQueues_p.html,
see http://bugs.yacy.net/view.php?id=362
2014-02-07 17:19:08 +01:00
Michael Peter Christen
726e8c3ad5 removed unused classes and servlets 2014-02-07 01:47:10 +01:00
Michael Peter Christen
6e59ca4ebf removed jena library and all code that depended on jena. When jena was
introduced, it was also used for search facets. The generic search
facets are now deduced from generic solr fields which makes jena as tool
for facet semantics superfluous.
2014-02-07 01:20:06 +01:00
Michael Peter Christen
9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table 2014-02-07 00:22:31 +01:00
Michael Peter Christen
e8bdf16ea7 added statistic information for solr resources in PerformanceMemory 2014-02-07 00:02:19 +01:00
Michael Peter Christen
931541d198 re-inserted default value re-set button to performance queues and
patched missing values for recent new queues
2014-02-06 22:39:19 +01:00
Michael Peter Christen
456e52e0d5 enhanced strategy to clear solr caches
- redesigned the instance mirror class (which was a mess)
- added final method to close a searcher (which otherwise keeps a cache)
- changed cache clear method which iterates over resources and calls
clear to all caches in the searcher resources
2014-02-06 19:13:29 +01:00
reger
bd1685c94a fix not needed getFileExtension().toLower (double)
add missing .getFileExtension
2014-02-05 03:45:02 +01:00
orbiter
a11f072504 enhanced didyoumean 2014-02-04 00:18:11 +01:00
Michael Peter Christen
c0e6a65ec3 enhanced didyoumean 2014-02-03 18:49:03 +01:00
Michael Peter Christen
6d2dab7b21 fixed 'resource leak' warning 2014-02-03 13:38:26 +01:00
orbiter
22e3524797 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-02-03 12:45:35 +01:00
orbiter
c40ba51ca6 added new suggest method which replaces more-than-one suggestions:
instead of computing suggest permutations of the given words, the
completion of a phrase using the given words is searched in the fulltext
index.
2014-02-03 12:44:52 +01:00
reger
ad4b213145 remove unused static var from HTTPDProxyHandler 2014-02-02 03:47:12 +01:00
reger
b693ce9759 allow combining selection of different search nav's (facets)
- selecting more than one nav combines the 2 selections (with AND)
- unselecting one nav clears all selected

(e.g. select filetype:pdf and /language/fr shows ~ french pdf's only)
2014-01-30 22:57:27 +01:00
reger
cb71413d19 fix page nav, to keeping modifier
(was new issue)
2014-01-30 22:00:32 +01:00
orbiter
416481c33e added a boost on appearance of combined words (in the same order the
user submitted that) when searching for more than one word
2014-01-30 10:51:08 +01:00
reger
c589ee8c6e URLproxy access check too tight
respect config ip pattern (was own ip)
2014-01-28 22:39:45 +01:00
Michael Peter Christen
ebfaf753b7 - faster initialization of index files
- removal of not used space if index files shrink (rare, but possible)
2014-01-28 12:39:58 +01:00
Michael Peter Christen
d2b8f2b477 enhancements for staticIP and ipv6 handling 2014-01-27 13:48:20 +01:00
reger
a71718a459 add config value for ssl/https port (default=8443)
adjust server routines to use config
2014-01-27 01:09:56 +01:00
reger
a3e2cca8e9 improve isOlder check to not overwrite node index with metadata on equal load date 2014-01-26 01:00:52 +01:00
reger
9b24dae2b7 add language navigation filter clause to rwi results 2014-01-25 22:59:23 +01:00
reger
f307d65dcf prepare for a language navigator
works fine to restrict language for local solrSearches.
More work needs to be done to make rwi/remote searches respect the modifier.language restriction.
2014-01-24 03:11:25 +01:00
reger
cf553e5045 added hint to web.xml and for completeness the full set of hardcoded mappings 2014-01-23 23:56:45 +01:00
Michael Peter Christen
c84bcc878a first try to add a generic solr servlet as luke request servlet 2014-01-23 19:01:31 +01:00
Michael Peter Christen
4cb7e2a2ca refactoring: renamed the SolrServlet to SolrSelectServlet for better
naming of more Solr Servlets
2014-01-23 17:20:49 +01:00
Michael Peter Christen
dc06e407ce added two virtual instances of solr for the both cores: collection1 and
webgraph. These cores are now accessible at
/solr/collection1/select instead /solr/select?core=collection1
and
/solr/webgraph/select instead /solr/select?core=webgraph
in addition to the old behavior to support compatibility to the old
peers. These new paths are fully solr standard-conform and will allow
the cross-linking between YaCy peers using their public solr API.
2014-01-23 17:14:13 +01:00
Michael Peter Christen
8b14e92ba4 added button in host browser to re-load 404/failed documents 2014-01-23 15:56:36 +01:00
orbiter
771d8261c1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-22 21:53:27 +01:00
orbiter
c351e47a84 fix for bad-formatted lonlat 2014-01-22 21:33:11 +01:00
reger
4c603b216e optimize parse ServerSideInclude 2014-01-22 21:23:32 +01:00
orbiter
5ec0c969c9 fix for http://bugs.yacy.net/view.php?id=354 2014-01-22 20:59:53 +01:00
orbiter
0002abd583 fix for OOM during remote search and too high load protection 2014-01-22 20:54:03 +01:00
sixcooler
5a917e13c6 use less ram on dht-URL transfer by not using a URIMetadataNode[] 2014-01-22 17:52:07 +01:00
Michael Peter Christen
c87cdfca2e do not set a load prerequisite that prevents the start of one-time-jobs 2014-01-22 17:18:53 +01:00
sixcooler
4d77ca52c9 workaround to let dht-out run on smal Systems like a Pi 2014-01-22 01:26:44 +01:00
Michael Peter Christen
6ada0daae9 making latency_factor and maximum number of same hosts in loader queue
settings available in Crawler_p.html servlet for steering.
2014-01-21 19:28:00 +01:00
Michael Peter Christen
489c3fbc90 code simplifications / removed warnings 2014-01-21 17:53:39 +01:00
Michael Peter Christen
0168f80c28 new crawling factors can now be changed during runtime 2014-01-21 17:52:16 +01:00
Michael Peter Christen
be5e808236 - removed hardcoded load-test which is now handled in BusyQueues
steering, see /PerformanceQueues_p.html
- changed default values for crawler queue load limit (high, because
these jobs are started upon user request)
2014-01-21 17:48:45 +01:00
sixcooler
40a4030b55 configurable max-load values for YaCy-Threads:
try lower values on smal systems like a Pi
2014-01-21 17:04:22 +01:00
sixcooler
6d8c023a5e lower client-connection for single-cpu-systems 2014-01-21 16:56:44 +01:00
Michael Peter Christen
77531850b5 reverted crawling strategy from latest commit. 2014-01-21 16:05:55 +01:00
Michael Peter Christen
c0da966dfa enhanced crawler speed 2014-01-20 21:46:40 +01:00
Michael Peter Christen
79809342fa added synchronization to exists() call bacause the concurrent call to
that method showed in thread dump close to deadlock situations. Its also
better to synchronize IO operations because they become faster then.
2014-01-20 21:09:03 +01:00
Michael Peter Christen
9a6912f2e6 if a http client thread is still running but we do not wait for it any
more, call an interrupt
2014-01-20 18:39:36 +01:00
Michael Peter Christen
0d235a565b cleanup crawl loader jobs 2014-01-20 18:36:00 +01:00
Michael Peter Christen
1ea17bd9f3 - removed old metadata database and all migration code
- refactored all code which uses URIMetadataRow as standard for word
hash length and word hash ordering and moved that to the class 'Word',
becuase the class URIMetadataRow defined the old metadata data structure
and should be superfluous in the future
- removed unused methods from URIMetadataRow as preparation for further
removal of that class
2014-01-20 18:31:46 +01:00
reger
d3de309953 fix IOexception logging issue in DefaultServlet
reason not sure but .logException triggers another exception
2014-01-20 08:12:35 +01:00
reger
97e84439fb adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString
- since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic,
adjusted ConfigHeuristic to use OpensearchHeuristic settings only.
For this the default OSD search target list is made available (copied) by default and the other configs are removed.

- the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object,
but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns
just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers)

- started to adjust internal html href references from absolute to relative (currently it is mixed).
For future development we should prefer relative href targets (less trouble with context aware  servlets)
2014-01-20 00:58:17 +01:00
Michael Peter Christen
022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http
request into a separate thread and ignores the furthure result of a
request if that does not answer within the requested time-out. This is a
try to solve a problem with the peer-ping, which hangs whenever a peer
appears to be dead or blocked.
2014-01-19 15:21:23 +01:00
Michael Peter Christen
42f3733a05 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-19 14:47:24 +01:00
Michael Peter Christen
25a6c05008 experimental removal of synchronization. This should work for all cases
where the size() and isEmpty() method is used only for statistics, which
happens at many locations in YaCy. If these methods are used for
structual reasons (like accessing the last element in an array) then it
may fail or cause other problems. As far as visible, this is not the
case.
2014-01-19 14:47:11 +01:00
Michael Peter Christen
5695280edd removed superfluous synchronization 2014-01-19 14:44:58 +01:00
Michael Peter Christen
a1977b7a75 removed debug code 2014-01-19 14:42:26 +01:00
orbiter
fd4abc0565 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-19 01:50:55 +01:00
orbiter
d5b8e473c8 added load limit for DHT transfer: RWI acceptance only if local load is
not too high
2014-01-19 01:50:42 +01:00
reger
2614fa7aeb Skip remote Solr search if last try showed error
As the solr servlet may not be available (e.g. no public search page, old version, individual access setting) a /solr/select error is 
remembered in the seed.dna of the remote peer.
This is not permanent, as flag is not stored and the seed is reloaded on several occasions, it is just a memory of the recent past status.
Might also be set to "not available" on time-out of last try.
2014-01-18 18:48:52 +01:00
orbiter
a07e9b3582 concurrency-solid version of transmission limitation 2014-01-18 12:55:05 +01:00
orbiter
60ead31273 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-18 10:50:36 +01:00
orbiter
52bf7d1ac8 reduce load during dht transfer 2014-01-18 10:50:24 +01:00
sixcooler
f0587d4af5 NP-fix, which was found on a Pi under 'havy' load 2014-01-18 00:03:44 +01:00
Michael Peter Christen
0bf3cab8c7 - better 'extra'-peer selection
- logging of health status for 'extra'-peer selection
- concurrency for remote peer IO and interrupting the threads if
time-out occurrs
2014-01-17 14:54:19 +01:00
orbiter
e3c4456c8e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-17 09:43:09 +01:00
orbiter
7f21d21d1d added synchronization to deeply-embedded solr connector
EmbeddedSolrConnector because deadlock situations show that methods in
lucene class seem to block.
2014-01-17 09:42:55 +01:00
reger
9b06774414 fix role name in GSA servlet 2014-01-17 01:00:02 +01:00
reger
0c754dd794 implemented DIGEST authentication, which is for remote login more secure
as BASIC were pwd is transmitted near clear text (B64enc).
This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST.

!!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash
- default authentication is still BASIC
- configuration at this time only manually in (DATA/settings) or  defaults/web.xml  (<auth-method>
- the realmname is in defaults/yacy.init  adminRealm=YaCy-AdminUI
- fyi: the realmname is shown on login screen
- changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin)
- implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST
- to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes (  "MD5:hash" )
2014-01-17 00:02:23 +01:00
Michael Peter Christen
ba44eb1160 when scaling the number of remote peers, also consider the machine load
and the number of cores
2014-01-16 17:34:26 +01:00
Michael Peter Christen
f8ce7040ab remote search peer selection schema change:
- all non-dht targets (previously separated into 'robinson' for dht-like
queries and 'node' for solr queries) are non 'extra' peers, which are
queries using solr
- these extra-peers are now selected using a ranking on last-seen,
peer-tag-matches, node-peer flags, peer age, and link count. The ranking
is done using a weight and a random factor.
- the number of extra peers is 50% of the dht peers
- the dht peers now exclude too young peers to prevent bad results
during strong growth of the network
- the number of dht peers (and therefore extra-peers) is reduced when
the memory of the peer is low and/or some documents still appear in the
indexing-queue. This shall prevent a peer from deadlocks when p2p
queries are made in a fast sequence on weak hardware.
2014-01-16 17:27:14 +01:00
Michael Peter Christen
47a82e471c less blocking in SeedDB which caused deadlocks in peer ping 2014-01-16 13:10:20 +01:00
Michael Peter Christen
ec10ed45bd better logging in logger 2014-01-16 13:08:39 +01:00
Michael Peter Christen
a5d7961812 replaced old caching in SolrConnector with a new one which is better for
concurrency and should prevent from 100% CPU usage after a long run of a
peer with a large number of documents.
2014-01-15 23:13:22 +01:00
reger
6e2fe777af simulate Authorization cookie for yacy servlet header 2014-01-10 19:31:36 +01:00
reger
ea7cef5d05 fix NPE in TemplateEngine
StackTrace For input string: ""
java.lang.NumberFormatException: For input string: ""
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.lang.Integer.parseInt(Integer.java:504)
	at java.lang.Integer.parseInt(Integer.java:527)
	at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:241)
	at net.yacy.server.http.TemplateEngine.writeTemplate(TemplateEngine.java:199)
	at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:896)
2014-01-10 18:11:32 +01:00
reger
cb6d0c2113 implementing YaCy legacy role names
- taking out customized SecurityHandler code as the original/default seems to just work fine
- with this individual sec. constraints can be applied via web.xml (using legacy role names)
2014-01-10 14:07:49 +01:00
reger
f09dbbef96 make SecurityHandler webappcontext ready 2014-01-10 12:36:42 +01:00
reger
37f2a82a5d making root context (htroot) a WebAppContext
- this allows additional features, like servlet configuration via web.xml and many more things.
- currently the standard servlets are still configured in the code (so the supplied defaults/web.xml is not realy needed, yet),
  but could be expanded
- lookup for web.xml - 1. in /DATA/SETTINGS then in /defaults
2014-01-10 10:42:47 +01:00
reger
28eae57e8b spend CrawlQueues a fremem routine
- clears errorStack
- will not get hit often (but better little than nothing on low mem)
2014-01-10 10:24:33 +01:00
reger
b931bf6b48 fix use of url proxy access pattern
pattern of transparent was used.
2014-01-08 08:12:56 +01:00
reger
280c4a3ac1 exclude terms with " for didYouMean suggestion
causes Solr error (and wordindex likely finds suggestion)

org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12.  Encountered: <EOF> after : ""
	at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
	at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179)
	at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345)
	at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364)
	at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326)
	at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440)
	at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464)
	at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181)
	at suggest.respond(suggest.java:73)
2014-01-08 04:46:21 +01:00
reger
fbc1071f6d Merge origin/master 2014-01-07 22:48:45 +01:00
reger
7b800a0c8e fix: NPE on shutdown via script 2014-01-07 22:44:24 +01:00
Michael Peter Christen
ce4d42d77c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-07 21:52:38 +01:00
Michael Peter Christen
644573cfc4 using the adminAccountUserName from yacy.conf within apicall.sh 2014-01-07 21:52:19 +01:00
reger
6932aa4d7a use configured admin-username for api calls
- the admin user name can be configured, in apiExec calls the default "admin" username is used. 

TODO: the bin/apicall.sh script should likely take that into account.
2014-01-07 21:26:50 +01:00
orbiter
2ead4e44d9 introduced a new storage path ARCHIVE inside of DATA which will be used
as path for solr index dumps (instead of the SEGMENTS path). This will
make a maintenance of index backups easier. It will also provide a tool
to migrate from an freeworld index to a webportal index.
2014-01-07 17:53:49 +01:00
sixcooler
add0e42804 fix double-escaped urls from proxy-usage 2014-01-07 01:04:33 +01:00
sixcooler
865ce6f974 check blacklist proxyClient config 2014-01-07 01:01:55 +01:00
sixcooler
345f9aba27 make use of our DNS-cache again - this realy speeds up the lookup 2014-01-07 00:18:01 +01:00
reger
e6d284fe1e better solution for prev. commit with MultiMapSolrParams.getFieldInt not returning default parameter 2014-01-06 18:19:54 +01:00
reger
0bc2fc14ab improve NPE chance on missing parameters
java.lang.NullPointerException
	at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:145)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501)
2014-01-06 17:52:21 +01:00
reger
f06cef5d5b reimplement proxy access by configured whitlist pattern
was currently limited to own ip.
2014-01-06 15:00:14 +01:00
reger
05d6cc6ea3 setting of IPv4Stack moved earlier
it seems even better to call system.setproperty before isrunning check
(if nothing helps we have to set it in startup script)
2014-01-06 11:28:05 +01:00
reger
30d925a96e reimplemented server access restriction
via Jetty IPAccessHandler to allow only configured IP's to access.
Handler is only loaded if a restriction is configured.

Since IPAcessHandler (Jetty 8) does not support IPv6 system property java.net.preferIPv4Stack=true
Testing showed system.setProperty seems to be sensitive to point of calling (earliest possible time seems to be best = early in yacy.main).
Moved the "isrunning..." just open browser check also to the new routine to preread the yacy.config only once.
2014-01-06 07:00:16 +01:00
orbiter
3cb6c7861f fixed shutdown authenticaton problem 2014-01-06 01:48:54 +01:00
Michael Peter Christen
ed06b5b94b set a realm message to log-in input window which explains that a
password for the account 'admin' can be (re-)set with the script
bin/passwd.sh
2014-01-05 17:43:34 +01:00
Michael Peter Christen
7005ecdabd cleanup 2014-01-05 15:06:40 +01:00
Michael Peter Christen
2939b47986 removed non-working realm setting in http client (auth for localhost was
added in previous commit)
2014-01-05 15:04:18 +01:00
orbiter
9d52b337f3 added http authentification to YaCy http client for all localhost
acesses to enable self-steering of the peer using the API table. This is
necessary in case that an password for the administration pages is set.
2014-01-05 14:46:11 +01:00
Michael Peter Christen
c951945666 modified log-in detail to enable admin-login from localhost with stored
hash even if localhost access is disabled. This is urgently needed for
the apicall.sh script since that is used for high-availability set-up
(checkalive and indexdump for index mirroring)
2014-01-05 11:50:23 +01:00
Michael Peter Christen
9bd71fdbb4 made the access tracker class static because it shall be used by the
jetty auth module
2014-01-05 05:04:28 +01:00
Michael Peter Christen
1c56befb93 fixed mess with test on localhost (which means local hosts for some
cases)
2014-01-05 04:55:30 +01:00
Michael Peter Christen
7d6fc79eb8 refactoring (usage of constant names for attributes of authentication
check)
2014-01-05 04:23:44 +01:00
Michael Peter Christen
b9d36e45e0 removed the &amp explicit encoding of ampersand character since this is
double-translated within the template replacement process.
2014-01-05 03:40:10 +01:00
reger
e2ccb6ce9d modified DefaultServlet parameter on invoke templates
call response with post=0 (if post empty) simulating previous behavior.

(template servlets typically test for post==null,
found one more Crawler.p.java were empty post caused problem,
= defaults not correctly set)
2014-01-04 20:49:26 +01:00
reger
4c38bceafc handle http connect for proxy
refactor header cleanup (reuse existing code)
2014-01-04 13:09:34 +01:00
reger
cfabe8f67a harmonize access restriction for urlproxy servlet
with proxy handler, what is currently
- use switched on in config
- access from a local IP / hostname

fix shutdown exception for crashprotection handler on interrupted connections.
2014-01-03 12:28:40 +01:00
reger
e6b9643fd6 extended request for local peer check to by hostname resolved ip
the current islocal() check did not detect a domain.com address as request for the local peer.
2014-01-03 01:13:56 +01:00
reger
c797f108a1 add error response on deniedl proxy access
send http 403 response
2014-01-02 09:11:08 +01:00
reger
0583f44306 reimplement proxy access log (to Jetty ProxyHandler)
- using existing HTTPDProxyHandler logger
- allow local loopback ip to access proxy
2014-01-02 03:37:33 +01:00
reger
8cbc1c970a Security Hot-Fix: for transparent proxy. 2014-01-01 20:48:35 +01:00
reger
58ecf5e4dd add to blacklist button in CrawlResults
http://bugs.yacy.net/view.php?id=220
introduced Blacklist.add with sourcefile only parameter
2014-01-01 11:01:22 +01:00
reger
e9081c0f17 moved startup execAPIActions call after Jetty startup
execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, 
but it's more robust to place the call after http is assigned to switchboard/serverSwitch.
2014-01-01 10:28:49 +01:00
reger
19c1a7a5ca change SolrServlet from Filter to Servlet
(as no multicore required)
this allows to simplify context/servlet initialization in Jetty init.
2014-01-01 10:20:32 +01:00
reger
14c977dd26 fix NPE GSAresponseWriter on query=null
java.lang.NullPointerException
	at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.highlight(GSAResponseWriter.java:328)
	at net.yacy.cora.federate.solr.responsewriter.GSAResponseWriter.write(GSAResponseWriter.java:263)
	at net.yacy.http.servlets.SolrServlet.service(SolrServlet.java:235)
2013-12-31 23:01:41 +01:00
orbiter
c3dee2d6bd added security patch 2013-12-31 15:25:44 +01:00
orbiter
dcf46ce8f6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-12-31 15:20:49 +01:00
orbiter
343d2ef49a new data type for access tracker (unfinished) 2013-12-31 15:20:34 +01:00
reger
dd8ea0cdd6 fix "add to blacklist" button style in IndexControlRWIs_p
- added default filename filter to select field (as only addition to *.black list is permanent)

- modified Blacklist_p header/legend to show all active blacklists 
  (to support understanding that all configured lists are active)
- removed obsolete code in Blacklist_p servlet
2013-12-30 20:03:59 +01:00
reger
abbf487023 fix QueryGoal Image query (missing space)
see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..
2013-12-29 20:14:10 +01:00
reger
26e9d7e066 fix NPE in IndexControlRWIs_p.html
- metatags my be null
Caused by: java.lang.NullPointerException
	at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445)
	at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400)
	at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345)
	at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334)
	at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290)
	at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176)
	at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641)
	at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)
2013-12-29 08:05:37 +01:00
reger
7f9b9315fe Merge origin/master 2013-12-29 02:05:07 +01:00
reger
8eaabb9600 remove dependency from old serverCore.java
- remaining getPortNr not needed 
  (as current release allows only to set plain integer as port,
   see ConfigBasic)
2013-12-29 02:00:44 +01:00
orbiter
2018e55f8b switched back on index deletion (was accidently off because new jetty
framework delivers never null to post arguments .. there may be more of
that kind of problems)
2013-12-29 01:39:30 +01:00
orbiter
3961b643a3 write solr searches to search log 2013-12-29 01:25:44 +01:00
orbiter
15882beb19 fix for strange NPE
java.lang.NullPointerException
        at
net.yacy.search.Switchboard.updateMySeed(Switchboard.java:3667)
        at net.yacy.peers.Network.peerPing(Network.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at
net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107)
        at
net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)
2013-12-29 00:40:31 +01:00
orbiter
f3ac923a7e ftp client shall be able to open non-anonymous ftp servers if login
details are given
2013-12-28 22:42:02 +01:00
reger
3d913558ab display configured adminUserName in ConfigAccounts_p
- fix read default username in  in loginservice
2013-12-27 21:04:14 +01:00
reger
fbdd89e198 Merge origin/master 2013-12-27 06:53:14 +01:00
reger
65a2f3d5e7 tweak Jetty credentials to work with YaCy UserDB
- user entry in UserDB with admin right can login to access protected pages
- dto. admin user, choosen username is stored in conf (adminAccountUserName=)
2013-12-27 06:45:22 +01:00
Michael Peter Christen
ffdfe5fb9b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-12-27 03:06:38 +01:00
reger
7d6b34a89f Merge origin/master 2013-12-27 03:04:14 +01:00
reger
45e8750ba5 nasty quick fix for admin login with other username as admin
- userDB is not sync'ed with Jetty credentials as of now only the std. admin account can login

switched initial browser open with ssl active back to std. http port
2013-12-27 02:59:19 +01:00
Michael Peter Christen
ee17bd0b69 added option to attach remote solr servers in read-only mode 2013-12-27 02:55:21 +01:00
Michael Peter Christen
25f9c35033 add patch which shall prevent that naive search mistakes like usage of
regular expressions cause no results. Usage of '*' followed by a dot or
any expression will now cause that this expression is used as a filetype
search.
2013-12-27 00:34:55 +01:00
Michael Peter Christen
667a6adddb - use default files from yacy.init property "defaultFiles" if no
jetty-configuration is given for default files.
- fix a problem with default paths if no path is given (i.e.
http://localhost:8090 instead of http://localhost:8090/). Without this
patch the path was resolved automatically to http://localhost:8090//
2013-12-26 23:59:04 +01:00
Michael Peter Christen
77aeb288a2 suppress deprecation warning (for now); TODO: find alternatives 2013-12-26 23:26:21 +01:00
reger
fca7f1d043 run SSL/HTTPS port (8443) ping test in migration only if SSL/HTTPS is on
- see last commit
2013-12-25 05:33:00 +01:00
reger
71cac1a278 added SSL/HTTPS connector to support SSL/https connection on port 8443
!!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used
   - added ping test for above to migration 

- as of now port for https is hardcoded to default 8443
- if not urgend required I'd leave it this way (it's standard) to use different ports for http and https 

- post https port on ConfigBasic.html (if active)
2013-12-25 05:20:13 +01:00
Michael Peter Christen
82c0525e71 wrong logger fix 2013-12-23 10:52:02 +01:00
Michael Peter Christen
e17624b6dd added html retrieval from alternative DATA/HTDOCS path 2013-12-23 02:06:33 +01:00
Michael Peter Christen
07cee6b99c removed more unused code 2013-12-23 01:51:48 +01:00
Michael Peter Christen
20b48f894f refactoring: moving all servlets to the same package (the solr servlet
is currently actually a filter which should be changed somehow)
2013-12-23 01:32:29 +01:00
Michael Peter Christen
84167adb49 removed unused anomichttpd code after migration to jetty 2013-12-23 01:23:40 +01:00
Michael Peter Christen
b461a27abb fixed the SolrServlet 2013-12-20 01:51:51 +01:00
Michael Peter Christen
7603e879dc Merge branch 'master' into HEAD
Conflicts:
	.classpath
	source/net/yacy/cora/federate/solr/SolrServlet.java
2013-12-20 01:19:06 +01:00
Michael Peter Christen
25250405f1 solr servlet preparation for join with jetty branch 2013-12-20 00:45:58 +01:00
Michael Peter Christen
2f16770681 migrated to solr 4.6.0 2013-12-19 21:51:05 +01:00
Michael Peter Christen
57f0f71ac6 added patch to allow binary response writer 2013-12-19 10:13:43 +01:00
orbiter
937273d4e3 added parsing of metadata to surrogate reading:
a dublin core record inside of surrogate input files may now contain
tokens within the namespace 'md' (short for: metadata). The token names
must be valid withing the namespace of the solr field names. All
md-tokens inside of surrogate files then overwrite values within solr
documents before they are written to the solr index. This makes it
possible to assign collection names to each surrogate entry and also
ranking information can be added. Please see the example file.
2013-12-17 14:02:27 +01:00
reger
18497f6475 remove unused init parameter from DefaultServlet
- remove "RelativeResourceBase" parameter
2013-12-15 23:39:19 +01:00
orbiter
4de3fefdb5 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2013-12-15 19:13:00 +01:00
orbiter
7e346e1d79 using stringbuilder in query construction 2013-12-15 19:12:49 +01:00
reger
c84c313fe1 Merge origin/master into jetty 2013-12-14 20:02:24 +01:00
Michael Peter Christen
2702d9e56b - added a SolrQueryResponse2SolrDocumentList method which is able to
work around the unfolding process in Solr's BinaryResponseWriter.
This was a huge performance bottleneck in the embedded solr connector
and the problem is actually on Solr side, but we have now a workaround.
- This made it possible to abstract a high-performance index access
method which is implemented as method getDocumentListByParams. That
method is also implemented in the SolrServerConnector and provides a
very efficient access to a solr index if the index is embedded.
- a popular use of the document list retrieval is a result count which
can now also make use of the new method, via getDocumentCountByParams.
- enhanced the Error cache which now does not store error documents
within the ram cache if the document is also written to solr. When
documents are retrieved from the cache, they are partly read from the
ram cache and if not existent there, from the Solr index.
2013-12-13 15:56:29 +01:00
Michael Peter Christen
74466d731a use pre-compiled patterns in ymark 2013-12-12 11:50:48 +01:00
Michael Peter Christen
34633044b4 made pattern computation static 2013-12-12 10:55:36 +01:00
Michael Peter Christen
ef7ddbc933 added date parser caches to prevent re-calculation of costly date
parsing
2013-12-12 10:55:12 +01:00