yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	6629e37685	tried to clean up the search process mess	2012-11-01 17:16:43 +01:00
Michael Peter Christen	f8f05ecba7	- added a delete button in host browser to delete a complete subpath - removed storage of default collection name - default is now "user" - made stacking of crawl start points concurrently	2012-10-31 17:44:45 +01:00
Michael Peter Christen	c326aa8f67	disabled writing new entries to crawl stacks to prevent that a domain with many documents block refreshing of the crawl queue	2012-10-29 22:26:52 +01:00
Michael Peter Christen	6905182d41	- fix for number of words log message - adding meta:refresh also to crawler stack	2012-10-29 21:42:31 +01:00
Michael Peter Christen	799d71bc67	enhanced solr caching: - increased cache size which is needed for longer solr commit time - speed hacks on cache write code	2012-10-28 20:31:29 +01:00
Michael Peter Christen	8e1248ffe3	force a commit in advance of a search for the administrator to get most recent results even if commit time is high and an indexing is ongoing.	2012-10-26 15:35:42 +02:00
Michael Peter Christen	3b48c78190	added an option to force a commit to solr. may be used by a search front-end in case that the commitWithinMs time is too short to get recently indexed documents.	2012-10-26 07:39:07 +02:00
Michael Peter Christen	ce0e5b1e17	- more refactoring / private methods - fix for usage of custom solr field names	2012-10-18 15:09:04 +02:00
Michael Peter Christen	ccc3760a47	Refactoring and redesign of data architecture to make URIMetadataRow superfluous. The target is to make a solr document as the core of YaCy documents which would cause that many conversions can be removed. On the way to this target the Equivalence of URIMetadataRow and URIMetadataNode had to be removed to expose the usage of the old URIMetadataRow data structure. This refactoring already removes unneccessary conversions and should make memory usage during indexing lower.	2012-10-18 14:29:11 +02:00
Michael Peter Christen	e5b3c172ff	removed hack which translated Solr documents to virtual RWI entries which had been then mixed with remote RWIs. Now these Solr documents are feeded into the result set as they appear during local and remote search. That makes the search much faster.	2012-10-17 17:45:41 +02:00
Michael Peter Christen	5d16c23a1f	specified more URIMetadata as URIMetadataNode	2012-10-16 18:26:21 +02:00
Michael Peter Christen	43f3345c90	- removed dependencies from URIMetadataRow and made direct access to URIMetadataNode which creates the opportunity to access Solr objects directly and use their information richness - lazy initialization of the URIMetadataNode object - should cause less computation and memory usage during search. - removed dead code	2012-10-16 18:11:57 +02:00
Michael Peter Christen	cc98496ff3	enhanced the HostBrowser: - showing also outbound links to other domains if there are any - the outbound links browser shows also the link structure image - showing even inbound links if the web structure graph has information about that - removed the left menu and made the HostBrowser a part of the top menu for search - moved the file search also to the top menu - added hover information in the HostBrowser to explain what the click means - because the HostBrowser also links to the Metadata viewer ViewFile, there should be a button to switch back to the HostBrowser: added that also.	2012-10-16 17:13:18 +02:00
Michael Peter Christen	21fe8339b4	- enhanced generation of url objects - enhanced computation of link structure graphics - enhanced collection of data for link structures	2012-10-15 13:17:13 +02:00
Michael Peter Christen	1b02408936	use less cache	2012-10-11 14:32:37 +02:00
Michael Peter Christen	5f0ab25382	removed the option to prevent removal of & parts inside of the MultiProtocolURI during normalform computation because that should always be done and also be done during initialization of the MultiProtocolURI Object. The new normalform method takes only one argument which should be 'true' unless you know exactly what you are doing.	2012-10-10 11:46:22 +02:00
Michael Peter Christen	7e3e45fd04	added Open Graph Metadata default fields, see http://ogp.me/ns#	2012-10-09 17:28:48 +02:00
Michael Peter Christen	c3e5f667a7	added schema.org breadcrumb counter to parser and solr schema	2012-10-09 13:02:43 +02:00
Michael Peter Christen	bd769de604	since the solr index is now used for all pages that are indexed locally, there is no need for the RWI index if the index is not transfered to another peer. Therefore the creation of RWI index data is now suppressed if DHT is disabled. This applies for all intranet and portal mode configurations, but not for public robinson modes. A robinson may switch back to public mode and then transmit its data. That means if someone wants to switch never to DHT mode, it would be more appropriate to choose the portal mode.	2012-10-09 11:48:55 +02:00
Michael Peter Christen	f8a3ab2d82	added the usage of synonyms to the GSA search interface	2012-10-02 14:29:45 +02:00
Michael Peter Christen	3d33a5bdf6	turned the synonyms_t Text field into a multi-valued String field synonyms_sxt	2012-10-02 11:13:06 +02:00
Michael Peter Christen	3b959ee002	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-10-02 10:14:09 +02:00
orbiter	3190347814	added a synonyms_t field to solr and a process to read synonym files. This can be used to add another stemming to solr using stemming files that are expressed as synonyms for grammatical alternatives. The synonym/stemming files must have the following form: - each line is a comma-separated list of synonyms - the list of synonyms may be enclosed with {} (like the GSA synonyms file) - the file may contain comments which are lines starting with a '#' The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and are activated by default whenever a synonym file is in place. Then, for each word that is found in a document all synonyms are added to a long text field which is stored into synonyms_t. Processes using the synonyms must query with that field as optional matcher.	2012-10-02 00:02:50 +02:00
Michael Peter Christen	411d0e839b	added an underline text field to solr to record all underlined texts	2012-10-01 14:16:49 +02:00
Michael Peter Christen	c4a3d8870f	fixed computation of links in host browser which are not indexed but knwon by the crawler. Such links are now displayed in grey color.	2012-09-29 02:13:11 +02:00
Michael Peter Christen	24d2ee3c52	- better date ranking - more protection against NPE and time travel effects	2012-09-26 18:36:32 +02:00
Michael Peter Christen	ca313e404f	- if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field)	2012-09-26 16:56:33 +02:00
Michael Peter Christen	a4214694df	We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index.	2012-09-26 16:05:11 +02:00
Michael Peter Christen	562183932b	- removed ip_s from default profile since that needs a DNS lookup to create an document entry. This makes remote search much slower. - removed synchronization of add method if ip_s is activated to prevent that a user configuration causes bad behavior. The disadvantage of that is, that a index dump can cause data loss if an indexing is running during index dump - catched more exceptions and more NPE - better abstraction in MirrorSolrConnector - slight performance enhancement when only the index count is requested (rows=0 is sufficient to get a total count)	2012-09-26 13:38:04 +02:00
Michael Peter Christen	1533bfd63b	refactoring	2012-09-25 21:20:03 +02:00
Michael Peter Christen	872f83ebe0	refactoring	2012-09-25 21:04:58 +02:00
Michael Peter Christen	fb9460f0a8	using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you.	2012-09-25 17:52:33 +02:00
Michael Peter Christen	15ea053c3a	- added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump	2012-09-25 00:19:52 +02:00
Michael Peter Christen	1b474139dd	used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime	2012-09-24 17:05:28 +02:00
Michael Peter Christen	8219a445f3	refactoring	2012-09-21 16:46:57 +02:00
Michael Peter Christen	00c1c777fa	refactoring	2012-09-21 15:48:16 +02:00
orbiter	563d584420	removed more dependencies in cora from kelondro	2012-09-21 11:02:36 +02:00
Michael Peter Christen	62add1d564	added the protocol and the file name extension to the solr fields since these fields are probably facets in file search	2012-09-11 22:46:39 +02:00
Michael Peter Christen	9db032664e	activate two solr fields which will be used by administration interface (later)	2012-09-11 20:15:54 +02:00
Michael Peter Christen	4634f0e626	fix for images_withalt	2012-09-10 12:30:03 +02:00
Michael Peter Christen	10b911eed4	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-09-07 22:07:02 +02:00
Michael Peter Christen	be67c70a47	added Solr fields: inboundlinks_text_chars_val inboundlinks_text_words_val inboundlinks_alttag_txt outboundlinks_text_chars_val outboundlinks_text_words_val outboundlinks_alttag_txt	2012-09-07 22:06:51 +02:00
orbiter	d73fff0e0e	added solr field images_withalt_i	2012-09-07 21:33:45 +02:00
sixcooler	e78fe3f477	also do a clearcache on the solr-connector-caches	2012-09-06 22:07:07 +02:00
Michael Peter Christen	d8425e6809	added collections to crawl monitor	2012-09-04 14:47:53 +02:00
Michael Peter Christen	ee23fc7a32	added h1..h6 counter fields	2012-09-04 14:11:11 +02:00
Michael Peter Christen	b2b516cc3e	added a collection attribute to crawls and searches: - a solr field collection_sxt can be used to store a set of crawl tags - when this field is activated, a crawl tag can be assigned when crawls are started - the content of the collection field can be comma-separated, all of them are assigned to the documents when they are indexed as result of such a crawl start - a search result can be drilled down to a specific collection; this is currently only available in the solr interface and also in the gsa interface using the 'site' option - this adds a mandatory field for gsa queries (the google api demands that field all the time)	2012-09-03 15:26:08 +02:00
Michael Peter Christen	f75b3f8a47	added more patches to work without RWI data structure	2012-08-31 14:35:56 +02:00
Michael Peter Christen	31d4d38804	- extended the solr interface by a references-by-word-count method - reduced danger that a non-existing RWI database causes NPEs - added Solr queries to did-you-mean: this makes it possible that our did-you-mean algorithm works together with only Solr and without RWIs	2012-08-31 13:03:00 +02:00
Michael Peter Christen	528d6763fa	- added new solr fields: title_count_i, title_chars_val, title_words_val description_count_i, description_chars_val, description_words_val - added many asserts to ensure data type correctness from YaCy to Solr and vice versa - made many fixes according to new findings from these asserts (!)	2012-08-31 10:30:43 +02:00

1 2 3

146 Commits