yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
orbiter	e816b88b55	changed behaviour of metadata storage: in case that any solr is attached, the metadata is not written to the metadata-db, even if it is enabled but instead to solr. This prevents that metadata is written in two store systems at the same time. It is also the next step to migrate the current metadata-db to solr.	2012-08-10 15:39:10 +02:00
Michael Peter Christen	f9c0e6e950	- Implemented and integrated the URIMetadataNode object which is a metadata representation from the solr index. This shall replace metadata from the built-in database in the future. - added the Solr-driven metadata into the search index of YaCy which makes it now possible to run YaCy without the old metadata index. This is a major stept forward to a full migration to Solr.	2012-08-10 13:26:51 +02:00
Michael Peter Christen	b2b480fff2	more abstraction of the YaCySchema -> Opensearch matching process	2012-08-10 09:48:15 +02:00
Michael Peter Christen	73f6d69d03	more abstraction for solr query params parsing	2012-08-10 07:58:45 +02:00
Michael Peter Christen	24462e9baa	set the title every time, it is possible that it has changed	2012-08-10 07:51:57 +02:00
Michael Peter Christen	136fcb1ad9	refactoring	2012-08-10 06:47:13 +02:00
Michael Peter Christen	a12f693ec9	added two response writer for embedded solr interface: a rss/opensearch writer and an enhanced solr xml writer. The enhanced solr writer has less configuration overhead than the original writer and should by slightly faster. The rss/opensearch writer is at this time slightly incomplete compared with the already existing rss search result form YaCy and also snippets are missing at this time. To test the new interface, open for example: http://localhost:8090/solr/select?wt=rss&q=olympia The wt-code for the new result writers are= wt=rss for opensearch wt=exml for the enhanced solr xml writer. Additionally, the SRU search parameters had been added to the solr interface which can now also be used for a normal solr/xml search.	2012-08-09 18:06:48 +02:00
orbiter	67edfd991c	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-08-05 15:49:48 +02:00
orbiter	d9173ba7ed	added more solr fields to integrate values from URIMetadataRow. All writings to the Metadata-DB are now also done to solr. This includes metadata transfer during search and rwi transfer. The new/added solr fields are: ## time when resource was loaded load_date_dt ## date until resource shall be considered as fresh fresh_date_dt ## id of the host, a 6-byte hash that is part of the document id host_id_s ## ids of referrer to this document referrer_id_ss ## the md5 of the raw source md5_s ## the name of the publisher of the document publisher_t ## the language used in the document; starts with primary language language_ss ## an external ranking value ranking_i ## the size of the raw source size_i ## number of links to audio resources audiolinkscount_i ## number of links to video resources videolinkscount_i ## number of links to application resources applinkscount_i	2012-08-05 15:49:27 +02:00
Michael Peter Christen	70b10e8316	added the JSON response writer to solr interface, add &wt=json to the servlet GET properties to use this format	2012-08-01 00:14:56 +02:00
Michael Peter Christen	8d944f6517	nowrap from gaston in forum http://forum.yacy-websuche.de/viewtopic.php?p=26815#p26815	2012-07-30 12:39:47 +02:00
Michael Peter Christen	24d9db1613	snippet retrieval loading processes may use a smaller minimum load time value than crawling processes. This speeds up the search result preparation dramatically.	2012-07-30 10:38:23 +02:00
Michael Peter Christen	1687737771	Abstraction of HandleMap and HandleSet	2012-07-27 12:13:53 +02:00
Michael Peter Christen	3bcd9d622b	cleaned up classes and methods which are either superfluous at this time or will be superfluous or subject of complete redesign after the migration to solr. Removing these things now will make the transition to solr more simple.	2012-07-25 14:31:54 +02:00
Michael Peter Christen	6f1ddb2519	Moved solr index-add method to the same method where the YaCy index is written. Also done some code-cleanup.	2012-07-25 01:53:47 +02:00
Michael Peter Christen	315d83cfa0	cleanup	2012-07-24 22:16:56 +02:00
Michael Peter Christen	76202f068e	extended abstraction of local and remote solr index using one front-end for index administration and querying.	2012-07-24 17:23:29 +02:00
Michael Peter Christen	7ec7341f60	added user-authentication protection to solr search (same as implemented for yacysearch)	2012-07-23 21:43:14 +02:00
Michael Peter Christen	e2a97ef8f6	better explain how to access the embedded solr	2012-07-23 21:31:12 +02:00
Michael Peter Christen	826967513b	changed options in IndexFederated_p to switch on/off parts of the index individually. The settings are experimental and the values of the settings will be overwritten when an index migration from urldb to solr starts.	2012-07-23 16:28:39 +02:00
Michael Peter Christen	cba4ab862e	fix for http://bugs.yacy.net/view.php?id=202	2012-07-23 00:36:18 +02:00
reger	36c9875b6e	removed localized number formatting from num-results_totalcount response (this is only used in xml and json where localized format is not valid)	2012-07-23 00:00:40 +02:00
orbiter	69e743d9e3	- more abstraction for the RWI index as preparation for solr integration - added options in search index to switch parts of the index on or off	2012-07-22 13:18:45 +02:00
orbiter	6cc5d1094e	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-07-21 13:34:57 +02:00
orbiter	05a3ffd03a	patches to ensure that solr connectors are active ony if they have a solr object assigned and vice versa	2012-07-20 11:47:50 +02:00
orbiter	5a3c829872	embedded solr is only initiated if it is activated with IndexFederated_p.html	2012-07-20 11:40:33 +02:00
Lotus	3a350a2f83	partial html fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4454	2012-07-20 08:53:12 +02:00
Michael Peter Christen	97b7bcf2a6	added a solr search index - by default, a (empty) solr storage instance is created at SEGMENTS/solr_36 - the index is written if in /IndexFederated_p.html the flag "embedded solr search index" is switched on - a standard solr query interface is available now with a new servlet at http://127.0.0.1:8090/solr/select To test this, do the following: - switch to webportal mode - switch on the feature as described - do a crawl. this fills the solr index. The normal YaCy search will NOT work now! - do a solr query, like: http://127.0.0.1:8090/solr/select?q=: http://127.0.0.1:8090/solr/select?q=text_t:Help play with different search fields as you can see in /IndexFederated_p.html You can use the standard solr query attributes as described in http://wiki.apache.org/solr/SearchHandler	2012-07-19 11:34:05 +02:00
Michael Peter Christen	f78ce93a80	collection of speed and memory saving hacks	2012-07-13 21:15:38 +02:00
orbiter	c00a3cf74d	less usage of generic logger to avoid logger generation overhead	2012-07-12 19:54:54 +02:00
orbiter	e76159040b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-07-12 11:14:04 +02:00
orbiter	bbfa497a3c	replaced more size() > 0 by !isEmpty()	2012-07-12 11:12:21 +02:00
Michael Peter Christen	e3aa05b9dd	added creation of subpath pattern when crawl start is 'from file'	2012-07-11 23:18:57 +02:00
orbiter	0cbda0b2b8	- replaced all length() == 0 and size() == 0 with isEmpty() - replaced some length() > 0 and size() > 0 with !isEmpty() - cannot be done automatically - implemented some isEmpty() methods	2012-07-10 22:59:03 +02:00
Roland 'Quix0r' Haeder	aef9dd0350	- removed cleaning of blacklist cache on startup - added cleaning of blacklist cache if cache is modified in interface - extended cache saving to all cache types - moved cache location to DATA/LISTS - fixed static file path which was relative to the application path but should be relative to data path - which is different in debian and mac implementations	2012-07-10 13:08:16 +02:00
orbiter	c7afa8bc48	using SwitchboardConstants for solr attributes	2012-07-10 12:01:20 +02:00
orbiter	62202e2d71	refactoring of query attribute variable names for better consistency with (next) stored query words	2012-07-09 11:14:50 +02:00
Michael Peter Christen	91f14ea38e	fix to solr configuration (case where the external solr was not online)	2012-07-06 01:29:13 +02:00
sixcooler	2c5b68d932	more abstraction of error message	2012-07-05 14:50:37 +02:00
Michael Peter Christen	9758c521ab	abstraction of error message	2012-07-05 14:27:28 +02:00
sixcooler	9b6e4e46ca	fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4430	2012-07-05 14:06:00 +02:00
Michael Peter Christen	b0c408788b	made class methods static where possible	2012-07-05 12:38:41 +02:00
Michael Peter Christen	5bd3c90907	- removed unnecessary semicolons - added default case for switch	2012-07-05 11:18:31 +02:00
Michael Peter Christen	7c1ba99755	removed more unused method parameters	2012-07-05 10:44:30 +02:00
Michael Peter Christen	0301aba1e9	removed unused method parameters	2012-07-05 10:23:07 +02:00
Michael Peter Christen	241dd8410a	removed snippet pattern filter - it was not used	2012-07-05 09:21:27 +02:00
Michael Peter Christen	d3964253ae	- added @SuppressWarnings to unused servlet method parameters - removed unnecessary casts - removed unnecessary throw statements	2012-07-05 09:14:04 +02:00
Michael Peter Christen	ea10766bfd	cleaned unnecessary nested code	2012-07-05 08:44:39 +02:00
orbiter	78fc3cf8f8	refactoring and new usage of SentenceReader: this class appeared as one of the major CPU users during snippet verification. The class was not efficient for two reasons: - it used a too complex input stream; generated from sources and UTF8 byte-conversions. The BufferedReader applied a strong overhead. - to feed data into the SentenceReader, multiple toString/getBytes had been applied until a buffered Reader from an input stream was possible. These superfluous conversions had been removed. - the best source for the Sentence Reader is a String. Therefore the production of Strings had been forced inside the Document class.	2012-07-04 21:15:10 +02:00
Michael Peter Christen	276a66a793	Adding a limit of 1000 links that a parser shall store during indexing. A limit was necessary because some web pages have such huge numbers of links that it can easily cause a OOM just by the number of links. The quesion if the number of 1000 links is sufficient or too weak must be answered with the result of testing this feature.	2012-07-03 17:06:20 +02:00

1 2 3 4 5 ...

3911 Commits