yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	bd886054cb	new structure and enhancements for link graph computation: - added order option to solr queries to be able to retrieve document lists in specific order, here: link length - added HyperlinkEdge class which manages the link structure - integrated the HyperlinkEdge class into clickdepth computation - extended the linkstructure.json servlet to show also the clickdepth and other statistic information	2014-04-09 12:45:04 +02:00
reger	f326a67561	fix: typo in default charset in metadata2solr update pom and NB build to Solr 4.7.1 libs	2014-04-06 22:31:22 +02:00
Michael Peter Christen	df138084c0	do solr optimization independently from memory and load constraints: - not doing an optimization will likely cause a too many files exception - without optimization performance will be even worse which would prevent optimization in the future as well (prevent a deadlock situation)	2014-04-06 11:04:23 +02:00
Michael Peter Christen	ebd44a7080	replaced solr 4.6.1 with solr 4.7.1 and added index migration to lucene_47	2014-04-06 10:45:03 +02:00
Michael Peter Christen	466d90ad42	fixed a problem with resource observer; probably coming from uncatched exceptions within the apache library which appear only in concurrency environments.	2014-04-04 15:26:39 +02:00
Michael Peter Christen	e8ddd415a8	enhanced the new link structure graph	2014-04-04 14:43:54 +02:00
Michael Peter Christen	926d28dd3f	fixed a bug which prevented crawl starts after a network switch	2014-04-04 14:43:35 +02:00
Michael Peter Christen	3ce8eff21b	another fix for inbound/outbound detection	2014-04-04 12:41:59 +02:00
orbiter	3c1274057d	fixed thread dump in case of wrong seeds	2014-04-04 10:54:56 +02:00
orbiter	18f9c40302	moved Edge class out of linkstructure servlet as this does not work on non-eclipse driven environments (all non-dev cases)	2014-04-04 10:54:11 +02:00
Michael Peter Christen	c64c10ef00	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-04-03 01:58:06 +02:00
Michael Peter Christen	48fbfa60c1	bugfix to inbound/outbound identification	2014-04-03 01:21:43 +02:00
reger	227c42bc96	eleminate obsolete URIMetaDataRow class by joining it with/into URIMetaDataNode.	2014-04-03 00:35:15 +02:00
Michael Peter Christen	cca851a417	introduced new solr field crawldepth_i which records the crawl depth of a document. This is the upper limit for the clickdepth_i value which may be shorter in case that the crawler did not take the shortest path to the document.	2014-04-02 23:37:01 +02:00
Michael Peter Christen	63c9fcf3e0	free configuration of postprocessing clickdepth maximum depth and time	2014-04-02 02:34:39 +02:00
Michael Peter Christen	8b44fcf0f4	added missing @Override annotation	2014-03-28 13:48:37 +01:00
Michael Peter Christen	e515dd460d	added linkscount_i and linksnofollowcount_i to the default solr schema	2014-03-27 23:36:08 +01:00
Michael Peter Christen	cbdfef7ce1	changed protocol facet to show also all other counts if one facet is selected	2014-03-27 13:29:14 +01:00
Michael Peter Christen	61ad194065	fix for source and target clickdepth in webgraph index	2014-03-26 16:00:05 +01:00
Marc Nause	809b4e1fd9	Team added support for URLs with unicode characters in host part to blacklist. Punycode is used to handle unicode characters.	2014-03-25 22:14:54 +01:00
reger	ca7444dbdf	limit filetype nav to known extension also on image/media search - on text search we limit filetype nav already to known extension, apply filter to image search	2014-03-23 23:10:29 +01:00
Michael Peter Christen	d1091e79f8	- added stealth button to navigation menu - more fixes to progress bar	2014-03-21 18:01:26 +01:00
orbiter	3c8d6e1eee	added adminAccount switch to ConfigAccounts_p servlet to switch on protection of all pages; some refactoring as well	2014-03-20 22:11:49 +01:00
orbiter	7d24bcb98d	added flag to require that all web pages, even such without a "_p" extension require authorization. (default off)	2014-03-20 19:09:47 +01:00
Michael Peter Christen	b08375da33	fix for bad/missing values of size_i	2014-03-11 09:51:04 +01:00
Michael Peter Christen	51800007c4	- added concurrency to postprocessing of webgraph document - bundeled separate webgraph postprocesing steps into one	2014-03-06 01:43:48 +01:00
Michael Peter Christen	e485fbd0ce	- let crawl loader jobs die after 10 seconds without new jobs - corrected shutdown order t prevent a deadlock during shutdown	2014-03-04 00:33:13 +01:00
Michael Peter Christen	bcd9dd9e1d	enhanced concurrent loading by using a fixed set of concurrent loader processes in favor of throwaway-processes. The control mechanism does less often report a 'queue full' message to the busy loop which then does not perform a long busy waiting; instead all requests are queued and new loader processes are started if necessary up to a given limit (as set before)	2014-03-03 22:13:40 +01:00
Michael Peter Christen	6ed9c0164e	attaching names to all Threads to get a better view in profiling tools like VisualVM	2014-02-28 15:02:01 +01:00
Michael Peter Christen	fdaeac374a	- enhanced postprocessing speed and memory footprint (by using HashMaps instead of TreeMaps) - enhanced memory footprint of database indexes (by introduction of optimize calls) - optimize calls shrink the amount of used memory for index sets if they are not changed afterwards any more	2014-02-28 14:01:09 +01:00
Michael Peter Christen	d325cb8912	fixes and enhancements for postprocessing	2014-02-28 02:51:14 +01:00
Michael Peter Christen	7c1b968378	another fix for the shutdown exceptions	2014-02-28 01:53:32 +01:00
Michael Peter Christen	1d069c5861	make sure that postprocessed documents are overwritten	2014-02-27 12:27:15 +01:00
Michael Peter Christen	e644981697	added one more postprocessing low memory check	2014-02-27 00:34:13 +01:00
Michael Peter Christen	e1bf65c892	added short memory protection during postprocessing	2014-02-26 23:02:56 +01:00
Michael Peter Christen	7640834b37	removed double concurrency to put Solr documents into the index. The writings to the solr index are also buffered in ConcurrentUpdateSolrConnector	2014-02-26 22:21:00 +01:00
Michael Peter Christen	0f6b72f24b	do not use luke requests for remote solr servers if the result is different from normal requests. This happens if the remote solr is actually a solrCloud; in such cases the luke request returns only the result of the single solr peer, not the whole cloud. also done: some refactoring.	2014-02-26 14:30:48 +01:00
Michael Peter Christen	a2b66fe2eb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-02-25 14:37:39 +01:00
Michael Peter Christen	9f6be762a6	- better logging for postprocessing - fixed collection bug in postprocessing	2014-02-25 14:37:30 +01:00
orbiter	ced1a96f9c	fixed error cache	2014-02-25 02:16:22 +01:00
orbiter	cfb647db6e	- introduced a miss cache in ConcurrentUpdateSolrConnector - better usage of cache - bugfix for postprocessing	2014-02-24 23:42:50 +01:00
orbiter	a87d8e4a8e	changed caching of ConcurrentUpdateSolrConnector: it caches now also the url along with the load date. While this takes much more memory, it eliminates database lookups for getURL() requests, which happen equally often. This speeds up remote solr configurations.	2014-02-24 22:59:58 +01:00
orbiter	f6e441dd77	refactoring	2014-02-24 21:01:56 +01:00
orbiter	76c53faeb2	removed unused code (HostStat)	2014-02-24 20:51:43 +01:00
reger	0923b09216	fix: allow 4 character admin user name (was min 5 char)	2014-02-24 00:01:11 +01:00
Michael Peter Christen	254a7ac66c	fixed cleaning of index	2014-02-22 01:35:01 +01:00
Michael Peter Christen	69391e5d9e	changed strategy to test existence of documents in Solr: using the update time. The reason for that is a better caching for the crawler double-check, which needs the update time for crawler steering.	2014-02-19 04:03:45 +01:00
Michael Peter Christen	790f103f32	delete fail-docs during postprocessing to prevent that they will appear again and stay in postprocessing forever.	2014-02-18 01:38:56 +01:00
Michael Peter Christen	9eb668e951	enhanced the resource observer The resource observer is now able to recognize free disk space AND available space for YaCy. The amount of space which is assigned for YaCy are defined in new settings in the configuration file. Furthermore, there is now a cleanup process which deletes files in case that an autodelete is activated. The autodelete is now BY DEFAULT ON if the disk space is low, which means that YaCy starts to delete documents when the disk is full!	2014-02-12 01:00:44 +01:00
Michael Peter Christen	bf97e38b83	removed clearURLIndex, which is a stub remaining from the old metadata database and not needed any more	2014-02-11 22:01:25 +01:00
Michael Peter Christen	bc28247089	Added methods in resource observer to calculate the available and the occupied disc space. These values are also shown on the status page. The disc space calculation shall be used for a disk-limitation of the search index.	2014-02-11 03:20:03 +01:00
Michael Peter Christen	ca8b100f96	run the cleanup process even when load is high, do postprocessing even if load > 1 (but < 2) but only if there is enough memory (now: 0.5 GB RAM available). The memory amount of the postprocessing is the cause that systems block because they run into a frequent-GC chain which almost locks the peer. If running with enough memory, the postprocessing is fast and not damaging to the system. Because the required RAM of 0.5 GB is never available in default setting, the postprocessing will not run if the peer is not reconfigured to use more memory.	2014-02-10 12:59:30 +01:00
Michael Peter Christen	195e5868d3	catch solr close exceptions	2014-02-09 15:04:46 +01:00
Michael Peter Christen	751c128544	extra sleep for remote searches enhances search results because there is more time for more remote peers to contribute on the first result page	2014-02-09 14:57:17 +01:00
Michael Peter Christen	0cabcbbe83	more efficient wordcount	2014-02-09 14:45:12 +01:00
Michael Peter Christen	3d474a843e	added memory protection for postprocessing	2014-02-09 12:36:56 +01:00
Michael Peter Christen	6e59ca4ebf	removed jena library and all code that depended on jena. When jena was introduced, it was also used for search facets. The generic search facets are now deduced from generic solr fields which makes jena as tool for facet semantics superfluous.	2014-02-07 01:20:06 +01:00
Michael Peter Christen	9228214f9b	enrichment of PerformanceMemory display of SolrInfoMBean table	2014-02-07 00:22:31 +01:00
Michael Peter Christen	e8bdf16ea7	added statistic information for solr resources in PerformanceMemory	2014-02-07 00:02:19 +01:00
Michael Peter Christen	931541d198	re-inserted default value re-set button to performance queues and patched missing values for recent new queues	2014-02-06 22:39:19 +01:00
Michael Peter Christen	456e52e0d5	enhanced strategy to clear solr caches - redesigned the instance mirror class (which was a mess) - added final method to close a searcher (which otherwise keeps a cache) - changed cache clear method which iterates over resources and calls clear to all caches in the searcher resources	2014-02-06 19:13:29 +01:00
orbiter	22e3524797	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2014-02-03 12:45:35 +01:00
orbiter	c40ba51ca6	added new suggest method which replaces more-than-one suggestions: instead of computing suggest permutations of the given words, the completion of a phrase using the given words is searched in the fulltext index.	2014-02-03 12:44:52 +01:00
reger	b693ce9759	allow combining selection of different search nav's (facets) - selecting more than one nav combines the 2 selections (with AND) - unselecting one nav clears all selected (e.g. select filetype:pdf and /language/fr shows ~ french pdf's only)	2014-01-30 22:57:27 +01:00
reger	cb71413d19	fix page nav, to keeping modifier (was new issue)	2014-01-30 22:00:32 +01:00
orbiter	416481c33e	added a boost on appearance of combined words (in the same order the user submitted that) when searching for more than one word	2014-01-30 10:51:08 +01:00
reger	9b24dae2b7	add language navigation filter clause to rwi results	2014-01-25 22:59:23 +01:00
reger	f307d65dcf	prepare for a language navigator works fine to restrict language for local solrSearches. More work needs to be done to make rwi/remote searches respect the modifier.language restriction.	2014-01-24 03:11:25 +01:00
Michael Peter Christen	c84bcc878a	first try to add a generic solr servlet as luke request servlet	2014-01-23 19:01:31 +01:00
Michael Peter Christen	8b14e92ba4	added button in host browser to re-load 404/failed documents	2014-01-23 15:56:36 +01:00
orbiter	5ec0c969c9	fix for http://bugs.yacy.net/view.php?id=354	2014-01-22 20:59:53 +01:00
Michael Peter Christen	6ada0daae9	making latency_factor and maximum number of same hosts in loader queue settings available in Crawler_p.html servlet for steering.	2014-01-21 19:28:00 +01:00
Michael Peter Christen	489c3fbc90	code simplifications / removed warnings	2014-01-21 17:53:39 +01:00
Michael Peter Christen	0168f80c28	new crawling factors can now be changed during runtime	2014-01-21 17:52:16 +01:00
Michael Peter Christen	be5e808236	- removed hardcoded load-test which is now handled in BusyQueues steering, see /PerformanceQueues_p.html - changed default values for crawler queue load limit (high, because these jobs are started upon user request)	2014-01-21 17:48:45 +01:00
sixcooler	40a4030b55	configurable max-load values for YaCy-Threads: try lower values on smal systems like a Pi	2014-01-21 17:04:22 +01:00
Michael Peter Christen	77531850b5	reverted crawling strategy from latest commit.	2014-01-21 16:05:55 +01:00
Michael Peter Christen	0d235a565b	cleanup crawl loader jobs	2014-01-20 18:36:00 +01:00
Michael Peter Christen	1ea17bd9f3	- removed old metadata database and all migration code - refactored all code which uses URIMetadataRow as standard for word hash length and word hash ordering and moved that to the class 'Word', becuase the class URIMetadataRow defined the old metadata data structure and should be superfluous in the future - removed unused methods from URIMetadataRow as preparation for further removal of that class	2014-01-20 18:31:46 +01:00
reger	97e84439fb	adjusted ConfigHeuristic and changed QueryGoal.getOriginalQueryString to .getQueryString - since specific heuristic Twitter & Blekko is not longer available or redundant with OpenSearchHeuristic, adjusted ConfigHeuristic to use OpensearchHeuristic settings only. For this the default OSD search target list is made available (copied) by default and the other configs are removed. - the return of QueryGoal.getOriginalQueryString includes the queryModifier, which are held separately in a modifier object, but in most (all) cases just the query term is expected, clarified and renamed it to QueryGoal.getQueryString which returns just the search term (if needed a .getOrigianlQueryString could be implemented in Queryparameters, adding the modifiers) - started to adjust internal html href references from absolute to relative (currently it is mixed). For future development we should prefer relative href targets (less trouble with context aware servlets)	2014-01-20 00:58:17 +01:00
Michael Peter Christen	022c6d3ce1	do YaCy p2p connections using a timeout-request which covers the http request into a separate thread and ignores the furthure result of a request if that does not answer within the requested time-out. This is a try to solve a problem with the peer-ping, which hangs whenever a peer appears to be dead or blocked.	2014-01-19 15:21:23 +01:00
reger	0c754dd794	implemented DIGEST authentication, which is for remote login more secure as BASIC were pwd is transmitted near clear text (B64enc). This has some implication as RFC 2617 requires and recommends a password hash MD5(user:realm:pwd) for DIGEST. !!! before activating DIGEST you have to reassign all passwords !!! to allow new calculation of the hash - default authentication is still BASIC - configuration at this time only manually in (DATA/settings) or defaults/web.xml (<auth-method> - the realmname is in defaults/yacy.init adminRealm=YaCy-AdminUI - fyi: the realmname is shown on login screen - changing the realm name invalidates all passwords - but for security you are encouraged to do so (as localhostadmin) - implemented to support both, old hashes for BASIC and new hashes for BASIC and DIGEST - to differentiate old / new hash the in Jetty used hash-prefix "MD5:" is used for new pwd-hashes ( "MD5:hash" )	2014-01-17 00:02:23 +01:00
Michael Peter Christen	f8ce7040ab	remote search peer selection schema change: - all non-dht targets (previously separated into 'robinson' for dht-like queries and 'node' for solr queries) are non 'extra' peers, which are queries using solr - these extra-peers are now selected using a ranking on last-seen, peer-tag-matches, node-peer flags, peer age, and link count. The ranking is done using a weight and a random factor. - the number of extra peers is 50% of the dht peers - the dht peers now exclude too young peers to prevent bad results during strong growth of the network - the number of dht peers (and therefore extra-peers) is reduced when the memory of the peer is low and/or some documents still appear in the indexing-queue. This shall prevent a peer from deadlocks when p2p queries are made in a fast sequence on weak hardware.	2014-01-16 17:27:14 +01:00
reger	28eae57e8b	spend CrawlQueues a fremem routine - clears errorStack - will not get hit often (but better little than nothing on low mem)	2014-01-10 10:24:33 +01:00
reger	280c4a3ac1	exclude terms with " for didYouMean suggestion causes Solr error (and wordindex likely finds suggestion) org.apache.solr.core.SolrCore org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'text_t:""d"': Lexical error at line 1, column 12. Encountered: <EOF> after : "" at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.query(EmbeddedSolrConnector.java:179) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector$DocListSearcher.<init>(EmbeddedSolrConnector.java:345) at net.yacy.cora.federate.solr.connector.EmbeddedSolrConnector.getCountByQuery(EmbeddedSolrConnector.java:364) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.getCountByQuery(MirrorSolrConnector.java:326) at net.yacy.cora.federate.solr.connector.ConcurrentUpdateSolrConnector.getCountByQuery(ConcurrentUpdateSolrConnector.java:440) at net.yacy.search.index.Segment.getWordCountGuess(Segment.java:464) at net.yacy.data.DidYouMean.getSuggestions(DidYouMean.java:181) at suggest.respond(suggest.java:73)	2014-01-08 04:46:21 +01:00
reger	6932aa4d7a	use configured admin-username for api calls - the admin user name can be configured, in apiExec calls the default "admin" username is used. TODO: the bin/apicall.sh script should likely take that into account.	2014-01-07 21:26:50 +01:00
orbiter	2ead4e44d9	introduced a new storage path ARCHIVE inside of DATA which will be used as path for solr index dumps (instead of the SEGMENTS path). This will make a maintenance of index backups easier. It will also provide a tool to migrate from an freeworld index to a webportal index.	2014-01-07 17:53:49 +01:00
orbiter	3cb6c7861f	fixed shutdown authenticaton problem	2014-01-06 01:48:54 +01:00
Michael Peter Christen	2939b47986	removed non-working realm setting in http client (auth for localhost was added in previous commit)	2014-01-05 15:04:18 +01:00
Michael Peter Christen	9bd71fdbb4	made the access tracker class static because it shall be used by the jetty auth module	2014-01-05 05:04:28 +01:00
Michael Peter Christen	7d6fc79eb8	refactoring (usage of constant names for attributes of authentication check)	2014-01-05 04:23:44 +01:00
Michael Peter Christen	b9d36e45e0	removed the &amp explicit encoding of ampersand character since this is double-translated within the template replacement process.	2014-01-05 03:40:10 +01:00
reger	e9081c0f17	moved startup execAPIActions call after Jetty startup execAPIActions require http to be up. The 10s sleep was sufficient to allow Jetty to start, but it's more robust to place the call after http is assigned to switchboard/serverSwitch.	2014-01-01 10:28:49 +01:00
orbiter	dcf46ce8f6	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-12-31 15:20:49 +01:00
orbiter	343d2ef49a	new data type for access tracker (unfinished)	2013-12-31 15:20:34 +01:00
reger	dd8ea0cdd6	fix "add to blacklist" button style in IndexControlRWIs_p - added default filename filter to select field (as only addition to *.black list is permanent) - modified Blacklist_p header/legend to show all active blacklists (to support understanding that all configured lists are active) - removed obsolete code in Blacklist_p servlet	2013-12-30 20:03:59 +01:00
reger	abbf487023	fix QueryGoal Image query (missing space) see query log example .. url_file_ext_s:(jpg OR png OR gif) ORcontent_type:(image/*)) ..	2013-12-29 20:14:10 +01:00
reger	26e9d7e066	fix NPE in IndexControlRWIs_p.html - metatags my be null Caused by: java.lang.NullPointerException at net.yacy.search.query.QueryParams.getFacets(QueryParams.java:445) at net.yacy.search.query.QueryParams.getBasicParams(QueryParams.java:400) at net.yacy.search.query.QueryParams.solrTextQuery(QueryParams.java:345) at net.yacy.search.query.QueryParams.solrQuery(QueryParams.java:334) at net.yacy.search.query.SearchEvent.<init>(SearchEvent.java:290) at net.yacy.search.query.SearchEventCache.getEvent(SearchEventCache.java:176) at IndexControlRWIs_p.genSearchresult(IndexControlRWIs_p.java:641) at IndexControlRWIs_p.respond(IndexControlRWIs_p.java:141)	2013-12-29 08:05:37 +01:00
reger	7f9b9315fe	Merge origin/master	2013-12-29 02:05:07 +01:00
reger	8eaabb9600	remove dependency from old serverCore.java - remaining getPortNr not needed (as current release allows only to set plain integer as port, see ConfigBasic)	2013-12-29 02:00:44 +01:00
orbiter	3961b643a3	write solr searches to search log	2013-12-29 01:25:44 +01:00
orbiter	15882beb19	fix for strange NPE java.lang.NullPointerException at net.yacy.search.Switchboard.updateMySeed(Switchboard.java:3667) at net.yacy.peers.Network.peerPing(Network.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165)	2013-12-29 00:40:31 +01:00
orbiter	f3ac923a7e	ftp client shall be able to open non-anonymous ftp servers if login details are given	2013-12-28 22:42:02 +01:00
Michael Peter Christen	ee17bd0b69	added option to attach remote solr servers in read-only mode	2013-12-27 02:55:21 +01:00
Michael Peter Christen	25f9c35033	add patch which shall prevent that naive search mistakes like usage of regular expressions cause no results. Usage of '*' followed by a dot or any expression will now cause that this expression is used as a filetype search.	2013-12-27 00:34:55 +01:00
reger	71cac1a278	added SSL/HTTPS connector to support SSL/https connection on port 8443 !!! attention !!! to make sure YaCy can start, https will be disabled if port 8443 is used - added ping test for above to migration - as of now port for https is hardcoded to default 8443 - if not urgend required I'd leave it this way (it's standard) to use different ports for http and https - post https port on ConfigBasic.html (if active)	2013-12-25 05:20:13 +01:00
Michael Peter Christen	82c0525e71	wrong logger fix	2013-12-23 10:52:02 +01:00
Michael Peter Christen	25250405f1	solr servlet preparation for join with jetty branch	2013-12-20 00:45:58 +01:00
Michael Peter Christen	2f16770681	migrated to solr 4.6.0	2013-12-19 21:51:05 +01:00
orbiter	937273d4e3	added parsing of metadata to surrogate reading: a dublin core record inside of surrogate input files may now contain tokens within the namespace 'md' (short for: metadata). The token names must be valid withing the namespace of the solr field names. All md-tokens inside of surrogate files then overwrite values within solr documents before they are written to the solr index. This makes it possible to assign collection names to each surrogate entry and also ranking information can be added. Please see the example file.	2013-12-17 14:02:27 +01:00
Michael Peter Christen	2702d9e56b	- added a SolrQueryResponse2SolrDocumentList method which is able to work around the unfolding process in Solr's BinaryResponseWriter. This was a huge performance bottleneck in the embedded solr connector and the problem is actually on Solr side, but we have now a workaround. - This made it possible to abstract a high-performance index access method which is implemented as method getDocumentListByParams. That method is also implemented in the SolrServerConnector and provides a very efficient access to a solr index if the index is embedded. - a popular use of the document list retrieval is a result count which can now also make use of the new method, via getDocumentCountByParams. - enhanced the Error cache which now does not store error documents within the ram cache if the document is also written to solr. When documents are retrieved from the cache, they are partly read from the ram cache and if not existent there, from the Solr index.	2013-12-13 15:56:29 +01:00
Michael Peter Christen	552ef9f18e	fix for bad ErrorCache.exists test (bug from latest commit)	2013-12-12 10:38:32 +01:00
Michael Peter Christen	09412ea3a4	counting search requests in solr interface	2013-12-12 03:37:19 +01:00
Michael Peter Christen	303f5694ba	avoid usage of existsByQuery. If a document can be loaded by the ID before testing other fields from the existsByQuery request, then a document cache fills and queries after that one can be avoided.	2013-12-12 03:36:30 +01:00
Michael Peter Christen	78eac85161	better calibration of caches and queue maximum sizes	2013-12-04 23:15:10 +01:00
Michael Peter Christen	c8af19bd37	removed unnecessary check which causes a NPE when searching with empty search string	2013-12-04 17:58:36 +01:00
Michael Peter Christen	e3c2f09de9	- reduce computation in case that specific postprocessing fields are not selected - de-select citation rank computation	2013-12-04 17:48:12 +01:00
Michael Peter Christen	cfa08024c7	removed optimization bevore postprocessing because that may cause a time-out which will cause that postprocessing fails.	2013-12-04 16:04:29 +01:00
Michael Peter Christen	6f3a923691	fixed urlmask which was not able to combine several constraints	2013-12-04 13:48:01 +01:00
Michael Peter Christen	a125904a1c	fixed a NPE in surrogat processing	2013-12-04 01:56:38 +01:00
Michael Peter Christen	0db8e34625	enhanced webgraph processing	2013-12-04 01:54:45 +01:00
Michael Peter Christen	a16534cb0a	tried to fix timeout and connection-lost problems when using an outside solr.	2013-11-28 01:31:53 +01:00
Michael Peter Christen	c3dcbdc8d5	try to recover from an OOM during citation index reading and fail-over to second solr core in case of unrecoverable OOM.	2013-11-28 01:10:25 +01:00
Michael Peter Christen	9932c441c8	fixed a problem with Date fields parsing Solr results if a remote Solr is attached.	2013-11-28 00:54:53 +01:00
Michael Peter Christen	ae55d69ef6	include/exclude size NPE fix (recently added)	2013-11-26 11:47:04 +01:00
Michael Peter Christen	2c39b65409	fixes for searches containing stopwords. The fix was done using a reconstruction of the search word set access method to protect that words are deleted from the sets from the outside of the QueryGoal class.	2013-11-26 02:24:47 +01:00
orbiter	037cd0a57c	using the BinaryResponseWriter which is supported within the YaCy solr servlet since YaCy 1.63. This is much more performant for the client than using the XMLResponseWriter because parsing of XML data is very CPU intensive. Older YaCy peers are still requested using the XMLResponseWriter but the majority of YaCy peers already respond with the binary writer. This makes remote searches much faster and less CPU intensive.	2013-11-25 21:31:40 +01:00
orbiter	61409788eb	less word hash computations (removing some overhead because of MD5 calcs) using the clear word in a normalized form.	2013-11-25 15:20:54 +01:00
reger	f23471c471	add check to prevent index entries containing url_file_ext_s with ";jsession=xyz" note: check could be implemented in MultiProtocolURL (but at this time didn't oversee possible implication)	2013-11-25 00:14:53 +01:00
orbiter	3e552550d1	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-11-18 22:48:00 +01:00
orbiter	c2d720cdaf	purge a lucene cache - possible memory leak fix	2013-11-18 22:47:35 +01:00
reger	e4f49fb175	for searchresults with empty title use filename as title - to not store a title in index which isn't extracted from source the title is empty check only added to ResultEntry class	2013-11-18 19:41:31 +01:00
orbiter	da33ee0d77	extended also timeout fr webgraph postprocessing	2013-11-16 18:30:06 +01:00
orbiter	74f9e40747	extended timeout during postprocessing of 30 minutes.	2013-11-16 18:29:08 +01:00
orbiter	19a051bec8	more monitoring for postprocessing and enhanced layout in Crawler monitor page	2013-11-16 18:23:14 +01:00
Michael Peter Christen	9cf9727685	fix for wrong counter	2013-11-16 11:33:35 +01:00
Michael Peter Christen	fceac8cffd	more monitoring for postprocessing	2013-11-16 08:23:42 +01:00
Michael Peter Christen	6842783761	fixed and enhanced postprocessing	2013-11-16 08:23:21 +01:00
Michael Peter Christen	bf1bdd52a6	prevent requesting of 0-facets (which actually exist)	2013-11-15 15:41:41 +01:00
Michael Peter Christen	9d5895f643	enhanced and fixed postprocessing	2013-11-15 15:41:12 +01:00
Michael Peter Christen	087df05e24	added option to Config_Network_p.html to enable remote search while DHT-Receive is switched off.	2013-11-13 13:38:01 +01:00
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	2013-11-13 06:18:48 +01:00
Michael Peter Christen	69b8d61c47	fix for search requests in GSA interface which contain 'funny' characters (like ':' etc.)	2013-11-12 15:54:54 +01:00
orbiter	4234b0ed6c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-11-10 18:50:43 +01:00
orbiter	909bbb49d8	added (partly commented) test code for url rewrite methods .. to be completed	2013-11-10 18:50:34 +01:00
Michael Peter Christen	acc1f8a749	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-11-07 12:01:37 +01:00
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	2013-11-07 10:01:44 +01:00
reger	7b17cdf6dd	add content_type:image/* to image search - see numerous idx entries with content_type image without url_file_ext_s (for various reason) which should be included in result - try it yourself with following sample query /solr/select?q=content_type:image/* AND -url_file_ext_s:[* TO *]&defType=edismax&fl=sku,url_file_ext_s,content_type adresses also possible url without or deviating extension.	2013-11-07 03:11:03 +01:00
sixcooler	987f410011	URL-export:add query and fix for cast-class-exception	2013-11-06 19:22:26 +01:00
Michael Peter Christen	0cf9e9580b	added clickdepth and CR computation debug code to verify that the process is complete	2013-11-06 15:01:40 +01:00
Michael Peter Christen	234a974955	load image only if their parser flag is activated	2013-11-04 11:59:28 +01:00
Michael Peter Christen	e1c1e57877	less overhead calling exist() with only one hash	2013-11-04 09:37:31 +01:00
Michael Peter Christen	5a02d650ee	avoid cloning	2013-11-03 18:31:50 +01:00
Michael Peter Christen	cc39667399	Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests)	2013-11-01 17:24:36 +01:00
Michael Peter Christen	434e13b46d	in host browser also show the properties of failed documents including referrer urls (this is a VERY USEFUL SEO and Web Admin feature!!)	2013-11-01 13:30:53 +01:00
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	2013-10-25 15:05:30 +02:00
Michael Peter Christen	5afa6e3aee	Automatically flush the log cache if a short memory status is reached. For the default of 200 lines this can flush about 10MB.	2013-10-24 17:39:50 +02:00
Michael Peter Christen	030d0776ff	Enhanced crawl start for very, very large crawl lists (i.e. > 5000) which had a problem because of badly used concurrency. This fix also caused a redesign of the whole host deletion process. This should fix bug http://bugs.yacy.net/view.php?id=250	2013-10-24 16:20:20 +02:00
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	2013-10-23 00:16:54 +02:00
Michael Peter Christen	82621bead0	When doing bootstraping, always accept one seedlist-File without checking the date of the file. This should help to start the peer in case that the user has a completely wrong date setting.	2013-10-22 15:34:51 +02:00
Michael Peter Christen	691d7e70fa	added hint to development/commit rss feed	2013-10-21 15:16:29 +02:00
Michael Peter Christen	c833d02cf5	fixed webgraph postprocessing (did nothing and repeated to do this...)	2013-10-16 11:49:04 +02:00
Michael Peter Christen	74d0256e93	enhanced postprocessing: fixed bugs, enable proper postprocessing also without the harvestingkey, remove crawl profiles after postprocessing, speed-up for clickdepth computation.	2013-10-16 11:27:06 +02:00
Michael Peter Christen	d328cc4a83	fix for didyoumean, added also more asian alphabets	2013-10-09 16:17:50 +02:00
Michael Peter Christen	90c8577840	enhanced ranking; patches to replace old ranking	2013-10-09 15:10:03 +02:00
Michael Peter Christen	1b61bd40ed	- Added new solr field url_file_name_tokens_t which stores the file name tokens. This can be used to enhance the ranking. - Added also a rating_i field as basis for later usage. - enhanced the tokenization process.	2013-10-08 23:48:13 +02:00
orbiter	5f5a97bafc	added the anchor text within web pages to the searcheable entities of a web page. This can be of benefit for the ranking if these fields are used for boosts.	2013-10-08 18:41:07 +02:00
orbiter	705b3338ee	list more fields available for search and for ranking boosts	2013-10-08 18:15:35 +02:00
Michael Peter Christen	78e7aadb26	removed unused initialization method	2013-10-07 23:51:28 +02:00
Michael Peter Christen	4fbc4740df	removed warnings	2013-10-07 23:41:50 +02:00
Michael Peter Christen	21aa6a0321	migration to Solr 4.5.0	2013-10-07 17:09:40 +02:00
Michael Peter Christen	101a6e6e14	Patch the citation index for links with canonical tags. This shall fulfill the following requirement: If a document A links to B and B contains a 'canonical C', then the citation rank computation shall consider that A links to C and B does not link to C. To do so, we first must collect all canonical links, find all references to them, get the anchor list of the documents and patch the citation reference of these links.	2013-10-07 11:15:58 +02:00
Michael Peter Christen	b28d43decc	added two more fields source_cr_host_norm_i,target_cr_host_norm_i in webgraph and an addition to postprocessing to copy all cr ranking attributes to the link edges associated to the postprocessing documents	2013-09-27 16:57:05 +02:00
Michael Peter Christen	a52f3a597e	fix for canonical-from-http-header feature	2013-09-27 15:09:04 +02:00
Michael Peter Christen	2dd7c5be44	added parsing of http-canonical tags (untested, could not find an example page)	2013-09-27 13:17:50 +02:00
Michael Peter Christen	3bf0104199	fix for crawl domain counter limitation (limit was reached too early)	2013-09-26 13:41:52 +02:00
Michael Peter Christen	82bfd9e00a	- crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue.	2013-09-26 10:22:31 +02:00
Michael Peter Christen	91a875dff5	self-healing of mistakenly deactivated crawl profiles. This fixes a bug which can happen in rare cases when a crawl start and a cleanup process happen at the same time.	2013-09-25 18:27:54 +02:00
Michael Peter Christen	095053a9b4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-09-25 17:32:52 +02:00
sixcooler	0cae420d8e	some dns-timing changes: since httpclient uses the domain-cache it is useful not to clean the domain cache until crawling is running (domains are filled into this cache) On huge crawl-starts (eg. from file) my DNS did not follow the high rates - so I reduced the rate and give some more time(-out)	2013-09-25 15:01:28 +02:00
Michael Peter Christen	4f83d5f18c	added the new field harvestkey_s to the collection index and the webgraph index which is temporary filled with the crawl profile key. This is used to select a set of documents for post-processing as soon as a crawl is finished. Now the postprocessing for a specific crawl is started when that specific crawl is finished and not at the end of all post-processing steps.	2013-09-25 14:38:24 +02:00
orbiter	14442efa6d	when profiles are cleaned, there shall be first a callback showing which profiles are cleaned. This shall enable a profile-termination-driven postprocessing. To do this, index writings must carry the profile key which will be implemented in another (next) step.	2013-09-25 11:04:12 +02:00
orbiter	8ac2e8c8c9	added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values.	2013-09-24 11:26:51 +02:00
Michael Peter Christen	96ed0c980e	- added hosthash to all documents (also fail documents which is needed there for deletion), this fixes a problem for the deletion of old documents for new crawl starts - added clickdepth and citation computation for fail documents	2013-09-23 18:09:42 +02:00
orbiter	828603e4f1	fix for 100%CPU problem in error cache cleaning process	2013-09-21 10:20:13 +02:00
orbiter	c64b51134e	hack to add all tokens from the url to text_t. This was working for the RWI index (and still is working) but not for solr-only search indexes. Maybe we should find a solution using a separate search field instead.	2013-09-21 08:57:43 +02:00
orbiter	f3be1930cb	CPU problem when pusing to the error cache; wrong class, ConcurrentHashMap needed for concurrency	2013-09-20 16:51:50 +02:00
Michael Peter Christen	e40671ddb7	better and consistent deletions for error urls	2013-09-17 15:52:57 +02:00
Michael Peter Christen	2602be8d1e	- removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory	2013-09-17 15:27:02 +02:00
Michael Peter Christen	31920385f7	set anchor rel attribute of all links to "nofollow" if the html meta contains a robots:nofollow or if the http header contains a "X-Robots-Tag: nofollow"	2013-09-16 16:14:56 +02:00
Michael Peter Christen	61c5e40687	- replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation	2013-09-15 23:27:04 +02:00
Michael Peter Christen	5e31bad711	- the webgraph shall store all links which appear on a web page and not all unique links! This made it necessary, that a large portion of the parser and link processing classes must be adopted to carry a different type of link collection which carry a property attribute which are attached to web anchors. - introduction of a new URL class, AnchorURL - the other url classes, DigestURI and MultiProtocolURI had been renamed and refactored to fit into a new document package schema, document.id - cleanup of net.yacy.cora.document package and refactoring	2013-09-15 00:30:23 +02:00
Michael Peter Christen	35ab2cef7b	added parsing of 'date', 'dc:date', 'dc.date' and 'last-modified' in html meta fields to get a correct (or: better) date timestamp. The http:last-modified mostly does not work because it is set to the current date from most CMS.	2013-09-10 10:31:57 +02:00
Michael Peter Christen	9cc8468b30	added tools to visualize image generation (i.e. during testing)	2013-09-09 12:58:26 +02:00
Michael Peter Christen	dbef8ccfcb	forced deletion of ZURL entries for a specific host for each host that appears in the crawl url list	2013-09-05 13:22:16 +02:00
Michael Peter Christen	e137ff4171	refactoring (im preparation for new removeHost method)	2013-09-05 09:59:41 +02:00
Michael Peter Christen	7a5574cd51	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-09-04 23:12:04 +02:00
Michael Peter Christen	85456f46b2	added two new fields, exact_signature_copycount_i and fuzzy_signature_copycount_i, which count the number of copies of non-unique documents and assigns this to each document. Thus, each document there is a number assigned which shows how many copies of this document exists. These fields are disabled by default.	2013-09-04 23:11:53 +02:00
orbiter	26366596d9	fix for a problem which ocurres when a site is crawled where the start url is redirected.	2013-09-04 16:00:47 +02:00
Michael Peter Christen	a2511b5600	turned images_alt_txt back to images_alt_sxt because it is not necessary to index the alt text. Indexed image Text is in images_text_t	2013-09-04 10:47:18 +02:00
Michael Peter Christen	85b1922244	activated image type navigation for image search	2013-09-03 13:34:01 +02:00
Michael Peter Christen	9e12fdff23	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-09-03 12:22:57 +02:00
Michael Peter Christen	ab1201fdfd	fixed wrong facet count	2013-09-03 12:22:29 +02:00
Michael Peter Christen	049c3b3f2e	added an option to exclude image search results from text search. This is on by default.	2013-09-03 11:14:23 +02:00
Michael Peter Christen	69f85265e1	added an option to put image links to the crawl queue and handle these like normal documents. Using this option (by default on at this moment; this might change soon) it is possible to get the exif data into the search index to be used in image search.	2013-09-03 11:13:45 +02:00
Michael Peter Christen	a8c5bfcf58	avoid to create unnecessary objects	2013-09-03 09:48:05 +02:00
Michael Peter Christen	5a0de1b77d	moving image description text to image text field	2013-09-03 09:47:27 +02:00
Michael Peter Christen	dc179bd61f	fix for catchall query goal for image search	2013-09-03 07:55:21 +02:00
reger	392174de8c	remove all_words, all_strings lists from QueryGoal - only used for text highlighting in parser text (ViewFile.html) which can be done with include_strings only	2013-09-02 23:09:43 +02:00
Michael Peter Christen	169ef8963d	one more fix for image search	2013-09-02 20:02:26 +02:00
Michael Peter Christen	cb85b22725	redesign of the image search process (with much better results, unfortunately the index schema has changed and p2p image search will not be muchmuch better until many people update)	2013-09-02 18:55:38 +02:00
reger	29967102a2	optimized QueryGoal (reducing mem and computation by removing all_hashes) - all_hashes used for text highlighting and word distance computation which can be done with include_hashes only	2013-09-02 04:19:53 +02:00
orbiter	f106345eef	link strings should not be tokenized	2013-09-01 14:35:36 +02:00
orbiter	deadeb406e	image alt tag strings should be tokenized	2013-09-01 13:48:10 +02:00
Michael Peter Christen	1a3e42eca4	index migration to lucene 4.4	2013-08-26 12:49:39 +02:00
Michael Peter Christen	a88a62f7aa	added a feature to set a collection for a crawl result based on a regular expression on th url: the collection attribut for a crawl start may be now either a token or a list of tokens, seperated by ',' where a token is either a string or a pair <string,pattern> where the string is separated to the pattern with a ':' and the string is assigned to the document as collection only if the pattern matches with the url.	2013-08-25 00:13:48 +02:00
Michael Peter Christen	765943a4b7	Redesign of crawler identification and robots steering. A non-p2p user in intranets and the internet can now choose to appear as Googlebot. This is an essential necessity to be able to compete in the field of commercial search appliances, since most web pages are these days optimized only for Google and no other search platform any more. All commercial search engine providers have a built-in fake-Google User Agent to be able to get the same search index as Google can do. Without the resistance against obeying to robots.txt in this case, no competition is possible any more. YaCy will always obey the robots.txt when it is used for crawling the web in a peer-to-peer network, but to establish a Search Appliance (like a Google Search Appliance, GSA) it is necessary to be able to behave exactly like a Google crawler. With this change, you will be able to switch the user agent when portal or intranet mode is selected on per-crawl-start basis. Every crawl start can have a different user agent.	2013-08-22 14:23:47 +02:00
Michael Peter Christen	47b1c81d08	- refactoring - generalized writing of url attributes to solr documents - added more url attributes to error documents	2013-08-20 15:46:04 +02:00
Michael Peter Christen	697613170d	less logging for postprocessing (this was a debugging logging with high CPU load)	2013-08-17 09:25:32 +02:00
reger	a5019bc470	make Vocabulary Navigator tags a hard result entry filter by checking vocabulary tags also for rwi results (currently a filter is applied to the solr query) TODO: as vocabularies are only locally valid, auto-switch to Searchdom.LOCAL could be considered.	2013-08-13 03:07:25 +02:00
reger	a67a4b7d86	improve tld: query modifier filter pattern (to prevent tld:net accepting www.abcinet.org)	2013-08-12 21:20:23 +02:00
reger	02fe8b43ba	Field Re-Indexing: display list of fields in reindex queue change servlet to display statistic on 1st click (instead after refresh)	2013-08-11 04:51:29 +02:00
sixcooler	7f501b7c38	clear some caches before reporting low Memory do not break lines in Network-table-rows	2013-08-08 14:38:26 +02:00
Michael Peter Christen	2857499467	fix to collection schema; bug appeared for _txt fields with empty String as content	2013-07-31 13:32:05 +02:00
Michael Peter Christen	58fe986cca	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-30 12:49:14 +02:00
Michael Peter Christen	cf12835f20	replaced the single-text description solr field with a multi-value description_txt text field	2013-07-30 12:48:57 +02:00
reger	f2d99053ed	Field Re-Indexing: prevent endless error loop in ReindexSolrBusyThread on Solr exception (by skipping query causing the exception) (occured during testing while working on q=store:[* TO *])	2013-07-29 01:32:02 +02:00
orbiter	d05e0c5368	wait a bit longer before doing the first peer ping	2013-07-27 11:00:35 +02:00
orbiter	b8f57f7703	don't be noisy when doing background tasks that may be allowed to fail	2013-07-27 10:51:58 +02:00
Roland Haeder	0343f0668c	Fix for NPE: E 2013/07/26 20:29:29 BUSYTHREAD Runtime Error in serverInstantThread.job, thread 'net.yacy.search.Switchboard.cleanupJob': null; target exception: null java.lang.NullPointerException at net.yacy.search.schema.CollectionConfiguration.convergenceStep(CollectionConfiguration.java:1116) at net.yacy.search.schema.CollectionConfiguration.postprocessing(CollectionConfiguration.java:897) at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2296) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:107) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:165) Conflicts: source/net/yacy/search/schema/CollectionConfiguration.java	2013-07-27 10:19:46 +02:00
Roland Haeder	b58ca8622d	Some cleanups: - added SKINS_PATH_DEFAULT as same as LISTS_PATH_DEFAULT was added - Added 'final' keyword to a string	2013-07-27 10:13:57 +02:00
Roland Haeder	7263bb82fb	Fix for NPE on shutdown: java.lang.NullPointerException at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:2732) at net.yacy.search.Switchboard.access00(Switchboard.java:207) at net.yacy.search.Switchboard.run(Switchboard.java:3049)	2013-07-27 09:55:43 +02:00
orbiter	080d80c9de	do not write an empty failreason in case that there is no fail. Because of the lazy instantiation rule this value was not actually written, but if lazy instantiation is switched on, then this causes that all crawl starts delete all crawl-start-hosts completely because this looks for filled error reasons.	2013-07-26 17:53:28 +02:00
Michael Peter Christen	61e015268b	fix in forced deletion: forced commit needed	2013-07-25 09:53:19 +02:00
Michael Peter Christen	c3b2301b2f	fix for http://bugs.yacy.net/view.php?id=268	2013-07-25 09:21:37 +02:00
orbiter	3e901dcb06	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-23 19:33:07 +02:00
orbiter	f50b596e0b	do not run dht ditribution if system load is over 2.5	2013-07-23 19:32:32 +02:00
orbiter	056b42f5aa	- added information about segment count to status_p.xml - also moved this information from the old index structure, which is still in use for the RWI/DHT index to that front-end	2013-07-23 18:03:33 +02:00
orbiter	6fb2811e68	fixes for problems with remote solr and non-activated webgraph index	2013-07-23 16:46:44 +02:00
sixcooler	af740f3058	changed optimization to a segment-size of index-size/5.000.000 + one if not idle + one (and force) if postprocessing	2013-07-23 14:21:12 +02:00
orbiter	5364c4dcc9	delayed first peer-ping to send the first ping out after the http got up; if the ping comes before the http is up, it cannot be recognized as senior peer (if at all). See also: http://bugs.yacy.net/view.php?id=266	2013-07-22 18:21:37 +02:00
orbiter	e24016e30a	added the property federated.service.solr.indexing.timeout to yacy.init to provide a configurable time-out for solr; see also: http://bugs.yacy.net/view.php?id=254	2013-07-22 17:45:12 +02:00
orbiter	c124037f19	removed forced non-soft commits to prevent index fragmentation	2013-07-22 17:28:20 +02:00
Michael Peter Christen	c15aa758dc	removed failreason_t removal patch because that causes too much confusion using an external solr. to clean up the index after a schema change, use the index cleaner function from the online servlet	2013-07-22 14:17:38 +02:00
Roland Haeder	be0ff6018f	Removed trailing spaces + some more final	2013-07-17 18:44:24 +02:00
Roland Haeder	841a28ae76	Added 'final' for all exception blocks as this helps the Java compiler to optimize memory usage Conflicts: source/net/yacy/search/Switchboard.java	2013-07-17 18:31:30 +02:00
Michael Peter Christen	89c0aa0e74	added collection_sxt to error documents	2013-07-17 15:20:56 +02:00
Michael Peter Christen	0df5195cb0	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-07-17 12:42:06 +02:00
Michael Peter Christen	1fd006cc56	fixes using the embedded connector	2013-07-17 12:41:54 +02:00
orbiter	d0dc86cf3d	logging of deadlocks (if any) during cleanup process	2013-07-17 12:38:58 +02:00

... 3 4 5 6 7 ...

1048 Commits