yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
orbiter	aeff31cd44	fix for workflow processor (cause: latest redesign for less threads)	2013-05-12 21:36:20 +02:00
Michael Peter Christen	77faeada4d	small memory leak patch	2013-05-11 11:19:06 +02:00
Michael Peter Christen	b24d1d18e4	removed synchronization and concurrency in Fulltext class, concurrent deletions are now handled in ConcurrentUpdateSolrConnector	2013-05-11 10:53:12 +02:00
Michael Peter Christen	b9b446bca6	- added ssl configuration sign (a lock) to network statistic/table - fixed a bug in bitfield	2013-05-10 17:32:21 +02:00
Michael Peter Christen	e6c8b545c2	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-05-10 12:16:55 +02:00
orbiter	a83c2fe833	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-05-10 12:02:40 +02:00
orbiter	4baa0d4a97	Added a default keystore for ssl encryption of the YaCy web interface. This will enable https-access to YaCy, but this feature is disabled by default using the new server.https=false attribute. This has two purposes: - make it easier for everyone to use https (just set server.https=true) - provide the basis for secure yacy-to-yacy communication in the future	2013-05-10 12:02:31 +02:00
Michael Peter Christen	aaddb4809c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-05-10 04:57:15 +02:00
Michael Peter Christen	038f956821	fix for sitemap detection: the sitemap url was not visible if it appeared after the declaration of robots allow/deny for the crawler because the sitemap parser terminated after the allow/deny rules had been found. Now the parser reads the robots.txt until the end to discover also sitemap rules at the end of the file.	2013-05-10 04:56:58 +02:00
reger	4fc6837690	- fix monitor url of crawl job in PerformanceQueues_p.html - reduce logging of every index add (switch embeddedsolr.add from info to debug)	2013-05-10 04:38:13 +02:00
Michael Peter Christen	442ed50be0	removed some unnecessary synchronizations	2013-05-09 03:06:48 +02:00
Michael Peter Christen	ad050ec88d	- upgraded httpclient, httpcore and httpmime - removed httpclient 3.1 which has been used by solrj < 4.x.x and is now not used any more - fixed some parts in YaCy which used methods from httpclient 3.1	2013-05-09 00:22:45 +02:00
orbiter	a1c989002b	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4652 generate dht data even if dht receive and dht transmission is switched off	2013-05-08 16:48:45 +02:00
Michael Peter Christen	e26bdd4a52	fixes to deletion methods (removed unnecessary concurrency and added removal of crawl queue entries)	2013-05-08 13:26:25 +02:00
Michael Peter Christen	f2c9b0b5f2	better robustness of Concurrent Solr Connector against update/deletion thread failure	2013-05-08 12:41:24 +02:00
Michael Peter Christen	f7f3e28c5e	prevent that the size of the index is computed too many times. Because the index size is now provided by solr, and the only way to do that is a match for [* TO *], a size computation is quite complex and time-consuming. Therefore this patch prevents that the method is called at all and if necessary puts a DOS-preventing barrier in front of it.	2013-05-08 11:50:46 +02:00
Michael Peter Christen	cca19d94d4	re-declared some fields to be of type string rather than text which makes them more efficient and less large	2013-05-06 16:45:54 +02:00
Michael Peter Christen	cc90f82dbb	increased default proxy client timeout to one minute	2013-05-06 14:58:18 +02:00
Michael Peter Christen	ed1d5bace6	draw the names of other peers which receive/send dht into the network graphic	2013-05-06 14:27:39 +02:00
Michael Peter Christen	b528448332	enlarge network graph circle according to image height and reduce the image height in the Network servlet. Overall, the image is now larger but takes less space on the web page.	2013-05-05 23:39:46 +02:00
reger	24d2b4baee	remove pre 1.0 migration statement which possibly overwrites user navigator setting	2013-05-05 05:00:42 +02:00
Michael Peter Christen	3841854c97	abstraction of catchall term	2013-05-04 00:14:22 +02:00
Michael Peter Christen	ea85674be2	added the date to error documents	2013-05-04 00:14:00 +02:00
Michael Peter Christen	6fafed2180	fix for solr cache when a delete buffer is filled and a document, which is the delete queue, is replaced with a new one.	2013-05-03 02:03:30 +02:00
Michael Peter Christen	20b767f35e	preventing score computation in solr where applicable	2013-05-03 02:02:35 +02:00
orbiter	7de5b9cfa0	fix for http://bugs.yacy.net/view.php?id=233 - check geolocation coordinates and accept only those, which are well-formed - the solr push process does not stop crawling any more if after 20 requests to Solr Solr does not accept the record. Instead, a severe log entry asks the user to create a bug request	2013-05-03 00:24:39 +02:00
Michael Peter Christen	ee217dbdee	remove sort order in all cases where not needed	2013-04-30 11:44:56 +02:00
Michael Peter Christen	70e981b333	prevent that long-running deletion tasks block a hard commit.	2013-04-30 11:09:21 +02:00
Michael Peter Christen	bb4bf3d8fd	infinity timeout bug protection patch	2013-04-30 11:06:48 +02:00
Michael Peter Christen	1b102d98d8	- added index deletion to index administration submenu - added index deletion processes to the process scheduler/recorder	2013-04-30 02:11:28 +02:00
Michael Peter Christen	d1be4127e7	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-04-29 19:31:40 +02:00
Michael Peter Christen	1aac722cc6	added another solr connector, the ConcurrentUpdateSolrConnector which does not block when long-running updates to solr are made. This is realized using blocking queues which process all long-running tasks in the background. Also some bugfixes to existing connectors.	2013-04-29 19:30:04 +02:00
Michael Peter Christen	0af7803367	added more features to ScoreMap (pretty toString)	2013-04-29 19:28:17 +02:00
Michael Peter Christen	f36a7da5f6	- re-introduced existById in solr connector. - intruduced raw-queries for the re-introduced byId-Queries (they are hopefully faster than full edismax queries) - removed the cached solr connector (testing this) to rely only on the solr built-in search caches. That should save some RAM (also). We will see if this is usable.	2013-04-28 21:20:14 +02:00
reger	46fa800bc7	added httpstatus_i to automatically switched on fields (used in all search queries)	2013-04-27 03:11:44 +02:00
Michael Peter Christen	3502b4c697	refactoring (renaming) of yacy-solr api	2013-04-27 01:32:18 +02:00
Michael Peter Christen	3a0fcfbeda	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-04-26 10:50:08 +02:00
Michael Peter Christen	25499eead5	- added a new field for the regular expression in crawl start - added the field in crawl profile - adopted logging end error management - adopted duplicate document detection - added a new rule to the indexing process to reject non-matching content - full redesign of the expert crawl start servlet The new filter field can now be seen in /CrawlStartExpert_p.html at Section "Document Filter", subsection item "Filter on Content of Document"	2013-04-26 10:49:55 +02:00
orbiter	e1bfe9d07a	- reduction of the concurrently running processes to make YaCy more adjusted to smaller and 1-core devices. - the workflow processor now starts no process at all. these are started as soon as parser/condenser/indexing queues are filled. - better abstraction	2013-04-25 11:33:17 +02:00
Michael Peter Christen	c091000165	added collection attribute also to the rss feed reader	2013-04-24 01:14:35 +02:00
orbiter	f7571386a3	added a 'collection' property attribute in yacysearch.html which can be used to select between different collections as defined during a crawl start with the 'collection' attribute. This actually implements the ability to prepare search tenants which restrict their search results to a specific collection. The main use for this is to provide tenants to the yaml4 interface (at this time).	2013-04-23 20:42:54 +02:00
orbiter	3e79bd4b1f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-04-23 12:15:46 +02:00
Michael Peter Christen	d937c55204	extended limitation of dom export size from 100000 to 100000000	2013-04-22 22:33:13 +02:00
Michael Peter Christen	fc2095ac67	some extensions to raster plotter to transform a RGB picture to an indexed color scheme. This is needed for gif animations	2013-04-22 14:33:04 +02:00
Michael Peter Christen	c1a2175fbc	added transparency to gif image animation and the integration to the YaCy httpd for on-the-fly generated gifs (including animated gifs)	2013-04-21 12:29:05 +02:00
orbiter	5d442dad82	avoid NPE in regex checker	2013-04-20 10:53:49 +02:00
Michael Peter Christen	50421171c3	added new schema fields: hreflang_url_sxt and hreflang_cc_sxt for http://support.google.com/webmasters/bin/answer.py?hl=de&answer=189077 navigation_url_sxt and navigation_type_sxt for http://googlewebmastercentral.blogspot.de/2011/09/pagination-with-relnext-and-relprev.html publisher_url_s for http://support.google.com/plus/answer/1713826?hl=de all fields are disabled by default and not written to the index.	2013-04-18 17:21:17 +02:00
Michael Peter Christen	566d6c980c	checking of document signature for a double-document check now refers only to documents within the same domain	2013-04-17 16:15:27 +02:00
Michael Peter Christen	1d30082446	added hindi translation configuration	2013-04-17 12:57:27 +02:00
Michael Peter Christen	d05dc07cff	setting of new default values for ranking	2013-04-16 15:02:00 +02:00
Michael Peter Christen	97775fbebc	fixed ranking for add-function queries: this did not work. The option was removed. All function queries are now boosts (multiplies the score according to a function). This is also the recommended way to boost rankings based on functions as explained in http://nolanlawson.com/2012/06/02/comparing-boost-methods-in-solr/	2013-04-16 14:45:14 +02:00
Michael Peter Christen	ac5fa9fe48	fix for result counter logging	2013-04-16 13:32:13 +02:00
Michael Peter Christen	7ab5093321	added new solr title_exact_signature_l and description_exact_signature_l to be able to identify unique title and unique description fields.	2013-04-16 01:35:15 +02:00
Michael Peter Christen	f24ac518e6	redesign of exists()-query (can now be called with query) and the CachedSolrConnector which based its cache on the key value. This will be used to correct the title_unique_b and description_unique_b field.	2013-04-15 14:08:30 +02:00
Michael Peter Christen	27d6222880	added new field host_extent_i which, after a crawl and postprocessing, holds the number of documents for the host where the document is hosted. This is necessary for ranking and the norming of references per local host in the ranking computation.	2013-04-14 20:52:40 +02:00
reger	518b20147c	skip postprocessing during document.store if no citation index connected (prevent null pointer exception)	2013-04-14 02:01:27 +02:00
Marc Nause	ac478384d3	*) did some long overdue refactoring	2013-04-13 23:04:44 +02:00
Michael Peter Christen	ada3f27de7	added three new field for a better ranking: references_internal_i, references_external_i and references_exthosts_i. These can be used to count and evaluate the number of external links to every web page. An experimental ranking function can be i.e.: div(add(references_internal_i,product(references_external_i,references_exthosts_i)),add(clickdepth_i,1))	2013-04-12 16:17:14 +02:00
Michael Peter Christen	082e3274d6	- setting the same default ranking in the solr interface as for YaCy search interfaces if no other ranking attributes are given - using the YaCy ranking in the GSA interface only if there was not given a GSA-style sort attribute - to avoid confusion about correct ranking attributes, only the default '0'-ranking profile is used and not scenario-adopted (site, date) because that should be configurable in the web interface before it is used actually for ranking.	2013-04-12 10:48:41 +02:00
Michael Peter Christen	a20941c067	resume paused crawls on startup; user expects that restarts 'heal' everything	2013-04-11 15:07:08 +02:00
Michael Peter Christen	edc0b33f6d	- showing references count and clickdepth in host browser - fixed generation and presentation of both values	2013-04-11 14:46:13 +02:00
reger	566a3b0294	fix: Index Administration > Reverse Word Index (IndexControlRWIs_p) corrected use of word search to word-hash search - removed duplicate QueryParams.hashes2Handles , redundant with .hashes2Set	2013-04-08 21:25:21 +02:00
Michael Peter Christen	cf0acd2cb4	upgrade to solr 4.2.1	2013-04-06 16:11:24 +02:00
reger	e89491271f	- fix opensearch discover err msg - webgraph not enabled - if no opensearchdescription link found in index - remove search2.net from sample config (is down)	2013-04-04 00:40:59 +02:00
reger	6a9d0b60a3	make sure configured port is reported on recreated mySeed.txt	2013-04-01 03:51:57 +02:00
Michael Peter Christen	870aedf3c6	fixes for better search interface integration in yaml templates	2013-03-20 16:19:49 +01:00
Michael Peter Christen	5512be6673	fix in GSA result writer which evaluates result context fields as String. After the migration to Solr 4.1.0 'some' of these fields suddenly are stored as String[]; this patch compensates this confusion.	2013-03-19 10:33:35 +01:00
Michael Peter Christen	342ba1049b	- callback fix - memory allocation problem in RowCollection: if memory is too low, do not to try to increase by 1 because this leads to very long execution time and at the end to the same OOM as if we allocate the memory at the moment we need it even if the resource observer states that this memory is not there. To compensate this, the increase size is reduced.	2013-03-19 10:32:01 +01:00
orbiter	65d73e5652	renamed callback function to 'callback' because that is a standard for jsonp which is also used in backbone.js/jquery	2013-03-19 00:59:47 +01:00
orbiter	17ae51e741	increased number of links limitation from 1000 to 10000 for rss feeds and html documents	2013-03-17 22:13:56 +01:00
orbiter	e4d26d1cb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-03-17 10:52:42 +01:00
orbiter	940c6849ee	enhanced did-you-mean (a bit): can now remember previously searched words (plus small enhancements)	2013-03-17 10:52:31 +01:00
reger	d57b221921	add: reset Solr schema filed selection to default button in IndexSchema_p	2013-03-17 03:46:29 +01:00
Michael Peter Christen	9406a2e438	fixed NPE during index abstract computation	2013-03-15 10:04:27 +01:00
Michael Peter Christen	16e9d4d1dd	added a restart hint	2013-03-15 10:00:06 +01:00
Michael Peter Christen	b3a54d5b1c	fix for wrong class name in log	2013-03-15 09:35:57 +01:00
Michael Peter Christen	2d36a7eaf5	- do not create a new query for all remote peers - no document search this time - adjusted banner and network to not show 'WORDS' but DHT Chunks. This is to avoid confusion for robinson peers which do not create Word Entries	2013-03-15 00:14:28 +01:00
Michael Peter Christen	4af0839be2	use appropriate ranking for each search situation: - when using the /date modifier, a date ranking profile is used - when using a site: modifier, a ranking profile supporting longer urls is used	2013-03-14 21:13:12 +01:00
Michael Peter Christen	b8ed66a55d	added all clickdepth computations for source and target paths in webstructure core	2013-03-14 17:54:33 +01:00
Michael Peter Christen	6300730d7f	refactoring of clickdepth computation as preparation for clickdepth computation of webgraph links	2013-03-14 12:13:02 +01:00
Michael Peter Christen	2080fc7406	removed unused tag fields	2013-03-14 10:35:21 +01:00
reger	230a12bfe2	adjust Opensearch discover function to new webgraph Solr schema	2013-03-14 03:10:54 +01:00
orbiter	6b13dd0d3d	added clickdepth field writing for webgraph core (unfinished)	2013-03-14 01:35:38 +01:00
orbiter	47114910d5	fix for possible memory leaks	2013-03-13 17:55:37 +01:00
Michael Peter Christen	addba047e2	changes in ranking computation - an existing ranking servlet for solr was extended. It is now possible to set boost values for fields, boost functions and boost queries. - The ranking can have different instances, but currently only the first one is used - added an abstraction layer for fields which can be used for search and those fields can be edited in the solr ranking configruation - the ranking value from solr within the field score is used to combine remote search requests, which all are created using the same locally defined boost values - reduced the number of fields which are used for search (makes it faster) - replaced some text fields by string fields (makes indexing faster) - removed classes which had no use - made a large number of experiments for a better ranking and created a temporary setting which prefers hits inside titles - adjusted also the RWI-based ranking computation to 'prefer title' - made special cases like for portal search where no post-processing and post-ranking is wanted: this keeps the original ranking order as done by Solr - fixed many bugs with old settings for ranking	2013-03-13 14:47:00 +01:00
reger	38f46eb33d	set RootNodeFlag only if EmbeddedSolr is connected (as RootNodes may receive direct Solr queries)	2013-03-12 03:13:14 +01:00
reger	2962f2b9e9	Merge branch 'master' of git://gitorious.org/yacy/rc1.git	2013-03-12 02:51:17 +01:00
orbiter	ab74d559fb	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-03-11 18:23:43 +01:00
Michael Peter Christen	4490133909	removed target_tag_s (superfluous)	2013-03-11 10:46:29 +01:00
orbiter	cd197bb555	fix for NPE if surrogates do not exist	2013-03-10 19:46:06 +01:00
reger	6ae30f9d0f	replace the terminateOldSessions - return immediate time from fixed 3 sec to requested minage parameter	2013-03-10 05:22:18 +01:00
Michael Peter Christen	252bb51f98	fix for wrong mime type in noload crawler	2013-03-07 15:31:00 +01:00
Michael Peter Christen	25300913fa	fixes to search debugging after testing with the different search debugging options	2013-03-05 21:28:22 +01:00
Michael Peter Christen	81380ae5c8	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-03-05 12:24:10 +01:00
Michael Peter Christen	c2fde018b5	concurrent snippet fetching from solr results which do not have snippets	2013-03-05 12:24:01 +01:00
orbiter	b1140e3d82	added debug switches for detailed search testing	2013-03-05 12:19:32 +01:00
orbiter	cdbfddf091	added filter queries for better image, audio and video results	2013-03-04 21:18:54 +01:00
Michael Peter Christen	587ef83eab	added missing cleanup statements for short memory cases during search	2013-03-04 13:01:24 +01:00
orbiter	2562f052b9	do not put the fulltext field text_t into the search cache because it is not used there and uses a lot of memory	2013-03-04 12:01:10 +01:00
Michael Peter Christen	2b6c79d347	in method exists() also use the new caching-stacks for documents/metadata	2013-03-04 01:13:17 +01:00
Michael Peter Christen	ae734b3f8d	enhanced the search result processing - no waiting time at the end - switched on 'classic' snippet production and verification (again)	2013-03-04 00:17:29 +01:00
Michael Peter Christen	0d7b4bc891	better protection against OOM during search flush and fixed missing result push	2013-03-03 23:45:47 +01:00
Michael Peter Christen	221ed7d764	- enhanced concurrency during search without IO blocking - introduced a second queue to flush remote search results (now: old metadata structure from DHT peers) - fixed result counters	2013-03-03 22:38:50 +01:00
Michael Peter Christen	3b1d9dc884	made index storage from DHT search result concurrently. This prevents blocking by high CPU usage during search. Also: removed query from Solr for DHT search results; results are taken from the pending queue.	2013-03-02 10:25:52 +01:00
orbiter	f13c0b2abd	fix for search	2013-03-01 19:18:16 +01:00
orbiter	0f7ea7ad9f	- enhanced solr.add procedure for mass adds - removed unused solr access classes - made snippet generation for documents aus YaCy RWI/DHT concurrent (as it was before the search process removation) - reduced the number of remote results in settings file because the processing of such mass documents add is too CPU-intensive (in Solr)	2013-03-01 15:27:17 +01:00
Michael Peter Christen	f327ffedb4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-02-28 15:55:13 +01:00
orbiter	9c09fd7d0b	better/less requests to local solr; the request is made in chunks which are exactly at only that size which is needed to present the current search result page. This will also cause that next solr request are made automatically during switching to next pages.	2013-02-28 14:04:08 +01:00
Michael Peter Christen	840fa22135	disabled clickdepth computation during craling since that is repeated during clean-up phase.	2013-02-28 02:25:39 +01:00
orbiter	d74472f562	corrected result counter	2013-02-27 22:40:23 +01:00
orbiter	2555542f7a	removed the dns prefetch because that was not soo useful	2013-02-27 20:58:34 +01:00
Michael Peter Christen	d957739441	removed size request	2013-02-26 17:53:44 +01:00
Michael Peter Christen	c95a84103a	complete redesign of search process: - removed 'worker' processes - no internal time-out behaviour: methods either are successful or return null - waiting is only done on top-level - removed snippet-production; this is replaced by solr snippets - removed statistics based on solr size queries (they had been VERY long); the statistics (like suggestions or tag cloud) are now again based on the old but very fast RWI index. In portal or intranet mode the RWI index is usually switched off; if you like to have statistics again then you must switch on the rwis again in this mode. - fixed many bugs regarding correct page counter	2013-02-26 17:16:31 +01:00
Michael Peter Christen	35fa718b77	testing to use solr for portalsearch caused some bugfixing but no full success: try to comment out the solr search request in yacy-portalsearch.js	2013-02-25 14:31:50 +01:00
Michael Peter Christen	008288719c	fix for schema export to consider also automatically generated coordinate fields	2013-02-25 01:13:03 +01:00
Michael Peter Christen	089dee1770	- generalized SchemaConfiguration into super-class Configuration and adopted other classes which used the configuration-only access for that class - removed many warnings - adjusted logging	2013-02-25 00:09:41 +01:00
Michael Peter Christen	c16de49f64	fix for webgraph delete query	2013-02-24 18:17:58 +01:00
Michael Peter Christen	56d5946a59	- added flags in IndexFederated_p.html to switch on or off the webgraph index (new solr core webgraph) .. this is now off by default - completely redesigned this servlet - added description how to attach a remote solr - adjusted naming of servlet and menues - moved 'lazy initialization' attribut from IndexSchema to IndexFederated (this is a general option) back again.	2013-02-24 18:09:34 +01:00
Michael Peter Christen	14cceb6b17	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Conflicts: htroot/IndexFederated_p.html source/net/yacy/cora/federate/solr/YaCySchema.java source/net/yacy/peers/Protocol.java source/net/yacy/search/Switchboard.java source/net/yacy/search/index/Segment.java also moved portalsearch-dev to yacy-portalsearch to be able to fix problems with new attachment to solr of the search widget	2013-02-23 08:48:33 +01:00
Michael Peter Christen	58e1e6fa2b	fixes to schema	2013-02-23 08:14:10 +01:00
reger	f291d60c5f	on remote Solr search take only locally enabled schema fields from remote solrdocument for the inputdocument added to local index	2013-02-22 22:17:45 +01:00
Michael Peter Christen	788288eb9e	added the generation of 50 (!!) new solr field in the core 'webgraph'. The default schema uses only some of them and the resting search index has now the following properties: - webgraph size will have about 40 times as much entries as default index - the complete index size will increase and may be about the double size of current amount As testing showed, not much indexing performance is lost. The default index will be smaller (moved fields out of it); thus searching can be faster. The new index will cause that some old parts in YaCy can be removed, i.e. specialized webgraph data and the noload crawler. The new index will make it possible to: - search within link texts of linked but not indexed documents (about 20 times of document index in size!!) - get a very detailed link graph - enhance ranking using a complete link graph To get the full access to the new index, the API to solr has now two access points: one with attribute core=collection1 for the default search index and core=webgraph to the new webgraph search index. This is also avaiable for p2p operation but client access is not yet implemented.	2013-02-22 15:45:15 +01:00
Michael Peter Christen	91a0401d59	introduced a second core named 'webgraph'. This core will hold the link structure, but is not filled yet. To have the opportunity of a second core, multi-core functionality had to be implemented to the deep-embedded solr: - migrated the solr_40 directory content to a subdirectory 'collection1'; the previously used default core is now called collection1 - added solr_40/webgraph subdirectory as second core - added a servlet configuration for the second core 'webgraph' in /IndexSchema_p.html - added instance handling as addition to solr connections: all solr connectors are now instances of an solr 'instance' object; this required a complete re-design of the solr embedding - migrated also caching and sharding ontop of new instance handling - migrated the search apis to handle now the access to a specific core, the default core named 'collection1' - migrated the remote solr search interface to access shards of cores; for the yacy remote search the default core is now called 'solr'; using the peer address as solr address - migrated the solr backup and restore process: old backups cannot be used after this migration! - redesign of solr instance handling in all methods which access the instances: they cannot hold copies of these instances any more; the must retrieve the actuall connection object every time they want to write to it (this solves also some bugs when switching the index/network) - added another schema 'solr.webgraph.schema', the old solr.keys.list is replaced by solr.collection.schema	2013-02-21 13:23:55 +01:00
Michael Peter Christen	33bc255e85	prevent that crawl starts with very large url lists cause a time-out in the user front-end	2013-02-15 01:58:28 +01:00
Michael Peter Christen	b6de1f42dc	Full redesign of solr connection architecture. This was done to support multiple solr cores instead of just one. Therefore it is now necessary to distuingish between solr server connections (called an 'Instance') and a connection to a single solr core. One Instance may now have multiple connector classes assigned to it, each connecting to a single core. To support multiple cores it is also necessary to distinguish between the connection configuration and the configuration of the index schema. We will have multiple schema configurations in the future, each for every solr core. This caused that the IndexFederated servlet had to be split into two parts, the new Servlet for the Schema editor is now in the IndexSchema Servlet.	2013-02-15 01:38:10 +01:00
Michael Peter Christen	4111606654	removed the commitWithin attribute because that is not the way how the index is updated the right way for us. May also be be superfluous with the solr 4.0 softcommit.	2013-02-13 02:29:47 +01:00
Michael Peter Christen	c20fa3640d	fix to unbalanced tag and license for null objects	2013-02-13 01:23:05 +01:00
Michael Peter Christen	3a6097966d	added jsonp option to yjson result writer	2013-02-13 01:11:57 +01:00
Michael Peter Christen	de58043205	Added image license generation for solr image search results when results are generated within yjson result writer. This makes it possible to view images in yacyinteractive from solr.	2013-02-13 00:33:53 +01:00
Michael Peter Christen	d3508fa8ff	fixed json search, quotes, auto-facets, urls etc. for yacyinteractive.html	2013-02-13 00:01:38 +01:00
Michael Peter Christen	1db23e9eac	Moved methods from SolrServerConnector to AbstractSolrConnector with the result that most of these methods become superfluous in other classes. This is a generalization step towards multi-indexes in Solr.	2013-02-12 22:03:10 +01:00
Michael Peter Christen	16d90859b7	reverted put-semantics back to as-usual in serverObjects and introduced an add-method to put in several objects for the same key	2013-02-12 11:52:33 +01:00
Michael Peter Christen	0d888ff69e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-02-12 03:42:58 +01:00
Michael Peter Christen	c34af7fe94	extended JSON Response Writer and Opensearch Response Writer for the Solr search interface in such way that it is possible to use this interface for the yacyinteractive search. This search interface is now much faster using the Solr search directly. For the Solr interface it was necessary to create a translation from the YaCy search modifiers to the Solr facet selection. This was added in such a way that it becomes generic for the normal YaCy search and as a on-top evaluation for Solr queries.	2013-02-12 03:42:46 +01:00
reger	c37d718f16	make sure yacy.running is deleted if not running (catch exception) - to prevent following log if YaCy was previously not properly shutdown E ... STARTUP WARNING: the file C:\src\git\yacy-rc1\DATA\yacy.running exists, this usually means that a YaCy instance is still running E ... STARTUP FATAL ERROR: java.util.concurrent.TimeoutException java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException at net.yacy.cora.protocol.TimeoutRequest.call(TimeoutRequest.java:91) at net.yacy.cora.protocol.TimeoutRequest.ping(TimeoutRequest.java:112) at net.yacy.yacy.startup(yacy.java:200) at net.yacy.yacy.main(yacy.java:638) Caused by: java.util.concurrent.TimeoutException - adjust Netbeans path (to solr4.1.jars)	2013-02-11 22:53:19 +01:00
Michael Peter Christen	762b687e47	extended the serverObjects to be able to hold multipel values for a single key. This is done using the solr class MultiMapSolrParams. That class is needed in the OpensearchResultWriter to get multiple facet requests.	2013-02-11 22:12:15 +01:00
Michael Peter Christen	d70d99fab5	added more metadata fields and facets to OpensearchResponseWriter. This should make it possible to replace the original and enriched yacy opensearch result with a solr output in opensearch format.	2013-02-11 22:10:14 +01:00
Michael Peter Christen	6a4878940b	fix in html parser and bookmark generation	2013-02-11 13:28:08 +01:00
Michael Peter Christen	dee8b24d3c	better error handling for bookmarks	2013-02-09 06:55:57 +01:00
Michael Peter Christen	e1da39245a	when searching the network, do not search on robinson peers with the old DHT search interface. Now use the solr interface.	2013-02-08 18:30:08 +01:00
Michael Peter Christen	6f6ddaf7e7	A robinson peer does not need to write RWI data if such peers are only searched using the solr interface. Searching public rpbinsons will be done with solr only in the future.	2013-02-08 17:58:54 +01:00
Michael Peter Christen	ab4f74c82c	fix for xml blacklist import	2013-02-08 15:12:10 +01:00
Michael Peter Christen	7806680ab8	fixed a problem with re-feeding of already indexed documents whith coordinates attached.	2013-02-08 12:45:54 +01:00
Michael Peter Christen	cb38e860cf	After the observation that Windows user simply forget that they started YaCy; YaCy is still running and the user additionally expect that another doubleclick on the YaCy icon simply opens the search windows (again) I decided to add a function that complies to the expectation to the user: simply open the browser pop-up page again if the user starts YaCy while YaCy is still running.	2013-02-07 23:39:00 +01:00
Marc Nause	27894d2c1a	Merge branch 'master' of git@gitorious.org:yacy/rc1.git	2013-02-05 21:09:41 +01:00
Marc Nause	75f9568472	) only install files from the RELEASE directory ) minor changes	2013-02-05 21:02:32 +01:00
Michael Peter Christen	eb80405a16	added a disable function in RemoteCrawl_p servlet which prevents setting of remote crawl if peer is not a senior or principal peer	2013-02-05 12:47:20 +01:00
Michael Peter Christen	19c46e4acf	catch more exceptions	2013-02-04 21:24:39 +01:00
Michael Peter Christen	7de502f43d	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-02-04 20:02:35 +01:00
Marc Nause	3bc5ee6e3d	*) added protection against CSRF in update download page (http://localhost:8090/ConfigUpdate_p.html?releaseinstall=../../test.txt&deleteRelease=Delete+Release does not work anymore)	2013-02-04 19:57:28 +01:00

1 2 3 4 5 ...

6383 Commits