Commit Graph

817 Commits

Author SHA1 Message Date
orbiter
133d41386c (again) full redesign of ConcurrentUpdateSolrConnector to remove
out-of-order transactions regarding add and delete operations. Now all
operations (add and delete) are executed concurrently in-order.
2014-02-28 00:19:30 +01:00
Michael Peter Christen
a632b0d2a4 added a forced commit to index deletion to enable synchronized index
updates
2014-02-27 12:50:40 +01:00
Michael Peter Christen
3cc5c0ffdd a concurrency enhancement which was not used because tests showed worse
indexing speed. I leave the code there since it may be useful in
SolrCloud environments.
2014-02-27 01:27:06 +01:00
Michael Peter Christen
90b47e83e6 fixed shutdown error when closing solr connectors 2014-02-26 22:47:16 +01:00
Michael Peter Christen
7640834b37 removed double concurrency to put Solr documents into the index. The
writings to the solr index are also buffered in
ConcurrentUpdateSolrConnector
2014-02-26 22:21:00 +01:00
Michael Peter Christen
0f6b72f24b do not use luke requests for remote solr servers if the result is
different from normal requests. This happens if the remote solr is
actually a solrCloud; in such cases the luke request returns only the
result of the single solr peer, not the whole cloud.
also done: some refactoring.
2014-02-26 14:30:48 +01:00
Michael Peter Christen
c57026e242 recover from OOM 2014-02-25 15:23:45 +01:00
Michael Peter Christen
907db8b7a6 fix for bad query shortcut hack 2014-02-25 15:19:04 +01:00
orbiter
cfb647db6e - introduced a miss cache in ConcurrentUpdateSolrConnector
- better usage of cache
- bugfix for postprocessing
2014-02-24 23:42:50 +01:00
orbiter
a87d8e4a8e changed caching of ConcurrentUpdateSolrConnector: it caches now also the
url along with the load date. While this takes much more memory, it
eliminates database lookups for getURL() requests, which happen equally
often. This speeds up remote solr configurations.
2014-02-24 22:59:58 +01:00
orbiter
d3a88eaecb introducing ConcurrentUpdateSolrServer for remote solr servers.
Scaling of write buffers and update queue size is made according to
assigned memory.
2014-02-24 20:26:02 +01:00
Michael Peter Christen
254a7ac66c fixed cleaning of index 2014-02-22 01:35:01 +01:00
Michael Peter Christen
28a7b42e6b removed warning "sun.misc.BASE64Encoder is internal proprietary API and
may be removed in a future release"
2014-02-22 00:52:49 +01:00
Michael Peter Christen
046f5a03cb one more SolrIndexSearcher bugfix 2014-02-21 23:48:56 +01:00
sixcooler
78c01b3eff fix for 'AlreadyClosedException: this IndexReader is closed' 2014-02-21 17:28:32 +01:00
Michael Peter Christen
1b5e3d523a better control over close-state of remote solr connections 2014-02-20 00:39:19 +01:00
Michael Peter Christen
1a364572a5 fix for
"org.apache.solr.core.SolrCore Too many close [count:-1] on
org.apache.solr.core.SolrCore@51af7c57"
-error
2014-02-20 00:03:35 +01:00
Michael Peter Christen
69391e5d9e changed strategy to test existence of documents in Solr: using the
update time. The reason for that is a better caching for the crawler
double-check, which needs the update time for crawler steering.
2014-02-19 04:03:45 +01:00
Michael Peter Christen
ff656ce860 explicit call to optimize to add a expungeDeleted flag 2014-02-12 01:01:23 +01:00
orbiter
14764632b5 clear solr caches in case that an exception occurrs. The reason behind
this hack is the occurrence of Exceptions like:
W 2014/02/11 18:51:33 ConcurrentLog GC overhead limit exceeded
java.io.IOException: GC overhead limit exceeded
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:334)
        at
net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getDocumentById(MirrorSolrConnector.java:173)
        at
net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getDocumentById(ConcurrentUpdateSolrConnector.java:415)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:331)
        at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:317)
        at
net.yacy.search.query.SearchEvent.pullOneRWI(SearchEvent.java:1024)
        at
net.yacy.search.query.SearchEvent.pullOneFilteredFromRWI(SearchEvent.java:1047)
        at
net.yacy.search.query.SearchEvent$3.run(SearchEvent.java:1263)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOfRange(Arrays.java:3077)
        at java.lang.StringCoding.decode(StringCoding.java:196)
        at java.lang.String.<init>(String.java:491)
        at java.lang.String.<init>(String.java:547)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187)
        at
org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351)
        at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276)
        at
org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110)
        at
org.apache.lucene.index.IndexReader.document(IndexReader.java:436)
        at
org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:657)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.SolrQueryResponse2SolrDocumentList(EmbeddedSolrConnector.java:230)
        at
net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getDocumentListByParams(EmbeddedSolrConnector.java:320)
        at
net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:330)
        ... 7 more
        
This problem was analysed with the Eclipse Memory Analyser after a heap
dump, where the following problem was reported as the main Problem
Suspect:

One instance of "org.apache.solr.util.ConcurrentLRUCache" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x42e940a0" occupies 902.898.256
(61,80%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system
class loader>".

This memory is part of the result cache of Solr. Flushing this cache
appears the most appropriate solution to that problem.
2014-02-11 20:56:40 +01:00
Michael Peter Christen
412d55523c enhanced memory protection and OOM exception handling in Solr connector 2014-02-09 12:36:14 +01:00
Michael Peter Christen
d9858e1b8a removed warnings and superfluous logging 2014-02-09 12:26:58 +01:00
Michael Peter Christen
94245ce0a8 fixed "Size in KBytes" calculation in PerformanceQueues_p.html,
see http://bugs.yacy.net/view.php?id=362
2014-02-07 17:19:08 +01:00
Michael Peter Christen
6e59ca4ebf removed jena library and all code that depended on jena. When jena was
introduced, it was also used for search facets. The generic search
facets are now deduced from generic solr fields which makes jena as tool
for facet semantics superfluous.
2014-02-07 01:20:06 +01:00
Michael Peter Christen
9228214f9b enrichment of PerformanceMemory display of SolrInfoMBean table 2014-02-07 00:22:31 +01:00
Michael Peter Christen
e8bdf16ea7 added statistic information for solr resources in PerformanceMemory 2014-02-07 00:02:19 +01:00
Michael Peter Christen
456e52e0d5 enhanced strategy to clear solr caches
- redesigned the instance mirror class (which was a mess)
- added final method to close a searcher (which otherwise keeps a cache)
- changed cache clear method which iterates over resources and calls
clear to all caches in the searcher resources
2014-02-06 19:13:29 +01:00
reger
bd1685c94a fix not needed getFileExtension().toLower (double)
add missing .getFileExtension
2014-02-05 03:45:02 +01:00
orbiter
a11f072504 enhanced didyoumean 2014-02-04 00:18:11 +01:00
Michael Peter Christen
d2b8f2b477 enhancements for staticIP and ipv6 handling 2014-01-27 13:48:20 +01:00
sixcooler
6d8c023a5e lower client-connection for single-cpu-systems 2014-01-21 16:56:44 +01:00
Michael Peter Christen
79809342fa added synchronization to exists() call bacause the concurrent call to
that method showed in thread dump close to deadlock situations. Its also
better to synchronize IO operations because they become faster then.
2014-01-20 21:09:03 +01:00
Michael Peter Christen
9a6912f2e6 if a http client thread is still running but we do not wait for it any
more, call an interrupt
2014-01-20 18:39:36 +01:00
Michael Peter Christen
1ea17bd9f3 - removed old metadata database and all migration code
- refactored all code which uses URIMetadataRow as standard for word
hash length and word hash ordering and moved that to the class 'Word',
becuase the class URIMetadataRow defined the old metadata data structure
and should be superfluous in the future
- removed unused methods from URIMetadataRow as preparation for further
removal of that class
2014-01-20 18:31:46 +01:00
Michael Peter Christen
022c6d3ce1 do YaCy p2p connections using a timeout-request which covers the http
request into a separate thread and ignores the furthure result of a
request if that does not answer within the requested time-out. This is a
try to solve a problem with the peer-ping, which hangs whenever a peer
appears to be dead or blocked.
2014-01-19 15:21:23 +01:00
orbiter
e3c4456c8e Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-17 09:43:09 +01:00
orbiter
7f21d21d1d added synchronization to deeply-embedded solr connector
EmbeddedSolrConnector because deadlock situations show that methods in
lucene class seem to block.
2014-01-17 09:42:55 +01:00
Michael Peter Christen
ba44eb1160 when scaling the number of remote peers, also consider the machine load
and the number of cores
2014-01-16 17:34:26 +01:00
Michael Peter Christen
f8ce7040ab remote search peer selection schema change:
- all non-dht targets (previously separated into 'robinson' for dht-like
queries and 'node' for solr queries) are non 'extra' peers, which are
queries using solr
- these extra-peers are now selected using a ranking on last-seen,
peer-tag-matches, node-peer flags, peer age, and link count. The ranking
is done using a weight and a random factor.
- the number of extra peers is 50% of the dht peers
- the dht peers now exclude too young peers to prevent bad results
during strong growth of the network
- the number of dht peers (and therefore extra-peers) is reduced when
the memory of the peer is low and/or some documents still appear in the
indexing-queue. This shall prevent a peer from deadlocks when p2p
queries are made in a fast sequence on weak hardware.
2014-01-16 17:27:14 +01:00
Michael Peter Christen
ec10ed45bd better logging in logger 2014-01-16 13:08:39 +01:00
Michael Peter Christen
a5d7961812 replaced old caching in SolrConnector with a new one which is better for
concurrency and should prevent from 100% CPU usage after a long run of a
peer with a large number of documents.
2014-01-15 23:13:22 +01:00
Michael Peter Christen
ce4d42d77c Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git 2014-01-07 21:52:38 +01:00
Michael Peter Christen
644573cfc4 using the adminAccountUserName from yacy.conf within apicall.sh 2014-01-07 21:52:19 +01:00
reger
6932aa4d7a use configured admin-username for api calls
- the admin user name can be configured, in apiExec calls the default "admin" username is used. 

TODO: the bin/apicall.sh script should likely take that into account.
2014-01-07 21:26:50 +01:00
sixcooler
add0e42804 fix double-escaped urls from proxy-usage 2014-01-07 01:04:33 +01:00
sixcooler
345f9aba27 make use of our DNS-cache again - this realy speeds up the lookup 2014-01-07 00:18:01 +01:00
orbiter
3cb6c7861f fixed shutdown authenticaton problem 2014-01-06 01:48:54 +01:00
Michael Peter Christen
2939b47986 removed non-working realm setting in http client (auth for localhost was
added in previous commit)
2014-01-05 15:04:18 +01:00
orbiter
9d52b337f3 added http authentification to YaCy http client for all localhost
acesses to enable self-steering of the peer using the API table. This is
necessary in case that an password for the administration pages is set.
2014-01-05 14:46:11 +01:00
Michael Peter Christen
1c56befb93 fixed mess with test on localhost (which means local hosts for some
cases)
2014-01-05 04:55:30 +01:00