yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
orbiter	133d41386c	(again) full redesign of ConcurrentUpdateSolrConnector to remove out-of-order transactions regarding add and delete operations. Now all operations (add and delete) are executed concurrently in-order.	2014-02-28 00:19:30 +01:00
Michael Peter Christen	a632b0d2a4	added a forced commit to index deletion to enable synchronized index updates	2014-02-27 12:50:40 +01:00
Michael Peter Christen	3cc5c0ffdd	a concurrency enhancement which was not used because tests showed worse indexing speed. I leave the code there since it may be useful in SolrCloud environments.	2014-02-27 01:27:06 +01:00
Michael Peter Christen	90b47e83e6	fixed shutdown error when closing solr connectors	2014-02-26 22:47:16 +01:00
Michael Peter Christen	7640834b37	removed double concurrency to put Solr documents into the index. The writings to the solr index are also buffered in ConcurrentUpdateSolrConnector	2014-02-26 22:21:00 +01:00
Michael Peter Christen	0f6b72f24b	do not use luke requests for remote solr servers if the result is different from normal requests. This happens if the remote solr is actually a solrCloud; in such cases the luke request returns only the result of the single solr peer, not the whole cloud. also done: some refactoring.	2014-02-26 14:30:48 +01:00
Michael Peter Christen	c57026e242	recover from OOM	2014-02-25 15:23:45 +01:00
Michael Peter Christen	907db8b7a6	fix for bad query shortcut hack	2014-02-25 15:19:04 +01:00
orbiter	cfb647db6e	- introduced a miss cache in ConcurrentUpdateSolrConnector - better usage of cache - bugfix for postprocessing	2014-02-24 23:42:50 +01:00
orbiter	a87d8e4a8e	changed caching of ConcurrentUpdateSolrConnector: it caches now also the url along with the load date. While this takes much more memory, it eliminates database lookups for getURL() requests, which happen equally often. This speeds up remote solr configurations.	2014-02-24 22:59:58 +01:00
orbiter	d3a88eaecb	introducing ConcurrentUpdateSolrServer for remote solr servers. Scaling of write buffers and update queue size is made according to assigned memory.	2014-02-24 20:26:02 +01:00
Michael Peter Christen	254a7ac66c	fixed cleaning of index	2014-02-22 01:35:01 +01:00
Michael Peter Christen	28a7b42e6b	removed warning "sun.misc.BASE64Encoder is internal proprietary API and may be removed in a future release"	2014-02-22 00:52:49 +01:00
Michael Peter Christen	046f5a03cb	one more SolrIndexSearcher bugfix	2014-02-21 23:48:56 +01:00
sixcooler	78c01b3eff	fix for 'AlreadyClosedException: this IndexReader is closed'	2014-02-21 17:28:32 +01:00
Michael Peter Christen	1b5e3d523a	better control over close-state of remote solr connections	2014-02-20 00:39:19 +01:00
Michael Peter Christen	1a364572a5	fix for "org.apache.solr.core.SolrCore Too many close [count:-1] on org.apache.solr.core.SolrCore@51af7c57" -error	2014-02-20 00:03:35 +01:00
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	2014-02-19 04:03:45 +01:00
Michael Peter Christen	ff656ce860	explicit call to optimize to add a expungeDeleted flag	2014-02-12 01:01:23 +01:00
orbiter	14764632b5	clear solr caches in case that an exception occurrs. The reason behind this hack is the occurrence of Exceptions like: W 2014/02/11 18:51:33 ConcurrentLog GC overhead limit exceeded java.io.IOException: GC overhead limit exceeded at net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:334) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getDocumentById(MirrorSolrConnector.java:173) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getDocumentById(ConcurrentUpdateSolrConnector.java:415) at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:331) at net.yacy.search.index.Fulltext.getMetadata(Fulltext.java:317) at net.yacy.search.query.SearchEvent.pullOneRWI(SearchEvent.java:1024) at net.yacy.search.query.SearchEvent.pullOneFilteredFromRWI(SearchEvent.java:1047) at net.yacy.search.query.SearchEvent$3.run(SearchEvent.java:1263) Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3077) at java.lang.StringCoding.decode(StringCoding.java:196) at java.lang.String.<init>(String.java:491) at java.lang.String.<init>(String.java:547) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.readField(CompressingStoredFieldsReader.java:187) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:351) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:276) at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:110) at org.apache.lucene.index.IndexReader.document(IndexReader.java:436) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:657) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.SolrQueryResponse2SolrDocumentList(EmbeddedSolrConnector.java:230) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getDocumentListByParams(EmbeddedSolrConnector.java:320) at net.yacy.cora.federate.solr.connector.AbstractSolrConnector.getDocumentById(AbstractSolrConnector.java:330) ... 7 more This problem was analysed with the Eclipse Memory Analyser after a heap dump, where the following problem was reported as the main Problem Suspect: One instance of "org.apache.solr.util.ConcurrentLRUCache" loaded by "sun.misc.Launcher$AppClassLoader @ 0x42e940a0" occupies 902.898.256 (61,80%) bytes. The memory is accumulated in one instance of "java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system class loader>". This memory is part of the result cache of Solr. Flushing this cache appears the most appropriate solution to that problem.	2014-02-11 20:56:40 +01:00
Michael Peter Christen	412d55523c	enhanced memory protection and OOM exception handling in Solr connector	2014-02-09 12:36:14 +01:00
Michael Peter Christen	d9858e1b8a	removed warnings and superfluous logging	2014-02-09 12:26:58 +01:00
Michael Peter Christen	94245ce0a8	fixed "Size in KBytes" calculation in PerformanceQueues_p.html, see http://bugs.yacy.net/view.php?id=362	2014-02-07 17:19:08 +01:00
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	2014-02-07 01:20:06 +01:00
Michael Peter Christen	9228214f9b	enrichment of PerformanceMemory display of SolrInfoMBean table	2014-02-07 00:22:31 +01:00
Michael Peter Christen	e8bdf16ea7	added statistic information for solr resources in PerformanceMemory	2014-02-07 00:02:19 +01:00
Michael Peter Christen	456e52e0d5	enhanced strategy to clear solr caches - redesigned the instance mirror class (which was a mess) - added final method to close a searcher (which otherwise keeps a cache) - changed cache clear method which iterates over resources and calls clear to all caches in the searcher resources	2014-02-06 19:13:29 +01:00
reger	bd1685c94a	fix not needed getFileExtension().toLower (double) add missing .getFileExtension	2014-02-05 03:45:02 +01:00
orbiter	a11f072504	enhanced didyoumean	2014-02-04 00:18:11 +01:00
Michael Peter Christen	d2b8f2b477	enhancements for staticIP and ipv6 handling	2014-01-27 13:48:20 +01:00
sixcooler	6d8c023a5e	lower client-connection for single-cpu-systems	2014-01-21 16:56:44 +01:00
Michael Peter Christen	79809342fa	added synchronization to exists() call bacause the concurrent call to that method showed in thread dump close to deadlock situations. Its also better to synchronize IO operations because they become faster then.	2014-01-20 21:09:03 +01:00
Michael Peter Christen	9a6912f2e6	if a http client thread is still running but we do not wait for it any more, call an interrupt	2014-01-20 18:39:36 +01:00
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	2014-01-20 18:31:46 +01:00
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	2014-01-19 15:21:23 +01:00
orbiter	e3c4456c8e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-01-17 09:43:09 +01:00
orbiter	7f21d21d1d	added synchronization to deeply-embedded solr connector EmbeddedSolrConnector because deadlock situations show that methods in lucene class seem to block.	2014-01-17 09:42:55 +01:00
Michael Peter Christen	ba44eb1160	when scaling the number of remote peers, also consider the machine load and the number of cores	2014-01-16 17:34:26 +01:00
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	2014-01-16 17:27:14 +01:00
Michael Peter Christen	ec10ed45bd	better logging in logger	2014-01-16 13:08:39 +01:00
Michael Peter Christen	a5d7961812	replaced old caching in SolrConnector with a new one which is better for concurrency and should prevent from 100% CPU usage after a long run of a peer with a large number of documents.	2014-01-15 23:13:22 +01:00
Michael Peter Christen	ce4d42d77c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-01-07 21:52:38 +01:00
Michael Peter Christen	644573cfc4	using the adminAccountUserName from yacy.conf within apicall.sh	2014-01-07 21:52:19 +01:00
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	2014-01-07 21:26:50 +01:00
sixcooler	add0e42804	fix double-escaped urls from proxy-usage	2014-01-07 01:04:33 +01:00
sixcooler	345f9aba27	make use of our DNS-cache again - this realy speeds up the lookup	2014-01-07 00:18:01 +01:00
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	2014-01-06 01:48:54 +01:00
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	2014-01-05 15:04:18 +01:00
orbiter	9d52b337f3	added http authentification to YaCy http client for all localhost acesses to enable self-steering of the peer using the API table. This is necessary in case that an password for the administration pages is set.	2014-01-05 14:46:11 +01:00
Michael Peter Christen	1c56befb93	fixed mess with test on localhost (which means local hosts for some cases)	2014-01-05 04:55:30 +01:00

1 2 3 4 5 ...

817 Commits