yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
luccioman	b3b75b0498	Accessibility : add a customizable alternative text to YaCy log Applied W3C recommendations : https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image and https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems	2016-09-22 16:08:33 +02:00
luccioman	3ee4f56c39	Improved ErrorCache behavior when switching networks Even after network switch, ErroCache was still holding a reference to the previous Solr cores, thus becoming useless until next YaCy restart. Initial error cache filling with recent errors from the index was also missing after the swtich.	2016-09-22 09:07:07 +02:00
luccioman	7d5ba2afa4	Added some JavaDoc and moved crawlStacker close at the right place.	2016-09-22 08:21:14 +02:00
luccioman	8edbcd8ad4	Log eventual Solr instances close errors. We do not want to block on this kind of error, but this should not silently fail as it may have later consequences.	2016-09-22 08:20:01 +02:00
reger	330768c8a2	fix for solr write.lock after mode change http://mantis.tokeek.de/view.php?id=686 The embedded core holds a lock on the index and must be closed. Earlier commit comment states that core should be closed with solr instance instead on close of connector. Adjusted the InstanceMirror.close() to take care of closing the embedded instance to release the lock. In 2 routines of fulltext this was already explicite implemented (disconnectLocalSolr). Now this disconnect is part of the InstanceMirror.close().	2016-09-22 00:16:22 +02:00
Michael Peter Christen	df51e4ef07	Merge branch 'master' of git@github.com:yacy/yacy_search_server.git	2016-09-19 11:01:58 +02:00
Michael Peter Christen	e063aaf97f	enable fuzzy search, solr style (append a ~ to get a fuzzyness on the word)	2016-09-19 11:01:39 +02:00
reger	7f63fc50f3	prepare a IndexSegment test case for RWI index testing + prevent NPE in Segment.clear() on missing embedded solr instance.	2016-09-11 23:25:44 +02:00
luccioman	06d4f93d03	Merged master into postprocessing branch	2016-09-07 09:28:37 +02:00
reger	e310ec5f70	fix posInText ranking calculation to score 0 on no position info + fix Word posInText calc in Tokenizer to start with 1 + test case	2016-09-06 00:05:59 +02:00
reger	51c077f493	adjust the getTopics() and getTopicNavigator() to current useage - move the maxcount limit restriction completely to getTopicNavigator (as there not used in getTopics) - let search servlet use getTopics by default (w/o RWI connected check, as of now, Topics are available w/o any additional index interaction)	2016-09-05 00:07:01 +02:00
reger	cc2d9dd3f1	reactivate the use of included-in-topwords boost in postRanking + changed the postRanking to add one score only if word appears more as one time. + getTopics() unused code block rem'd (save performace)-> routine needs rework !	2016-09-04 00:09:45 +02:00
reger	6801673a07	apply postranking media search boost only on media queries	2016-09-03 03:37:40 +02:00
luccioman	8c49a755da	Postprocessing refactoring Added Javadocs to refactored methods. Added log warnings instead of silently failing some errors. Only fill collection1hosts when required ( shallComputeCR true).	2016-09-01 15:40:28 +02:00
luccioman	42f45760ed	Refactored postprocessing For easier understanding and performances profiling.	2016-08-31 12:16:25 +02:00
Michael Peter Christen	079112358c	Merge branch 'master' of https://github.com/yacy/yacy_search_server.git	2016-08-19 15:31:09 +02:00
Michael Peter Christen	efeb592661	don't do solr optimization, this create high IO load. We should leave this task to solr to do that on it's own instead of forcing it.	2016-08-19 15:30:53 +02:00
reger	4c7a77662a	eleminate dependency on file-extension in storeDocument but use supported mime-type to also support handling of urls w/o corresponding file-extension. For this refactor use of document.getParserObject() to alway return a Parser (for clean logic) and define/move the scraperObject as local var of AbstractParser. Adjust related calls to getParserObject (where actually a scraperObject is wanted). Addionally skip appending url token to parsed text for dht metadata entries (by default returned as result by rwi index).	2016-08-14 03:53:16 +02:00
reger	2910fe35c1	add missing scheduler calc of next exec_date (call of calculateAPIScheduler) - after last_exec_date is altered, next_exec_date should be recalculated - makes the recalculation of next_exec in advance (without api call surely made) in Switchbard.schedulerJob() obsolete Slightly modify next_exec calc. on missed event to now+schedule_time (from fix 10min)	2016-08-09 03:03:04 +02:00
reger	70d47ae38a	keep scheduler selection by repeat entry from `07311020d4` to allow exec schedule on actual exec event. Iterate on exec date (of advantage after interruption/shutdown) to schedule older or missed events first.	2016-08-08 02:19:48 +02:00
reger	7c3f932e5d	revert due to conflict with double count recording by schedulter / servlet by the commit under normal operation (no shutdown)	2016-08-08 01:57:31 +02:00
reger	07311020d4	postpone apicall exec date init until actual call fix for http://mantis.tokeek.de/view.php?id=677 The difference is on scheduling a large number of rss feeds and loading is not finished before shutdown of YaCy. The change makes sure not already loaded RSS will be loaded by the scheduler on next startup.	2016-08-07 05:08:55 +02:00
reger	fcad2d0744	add uses of config constant INDEX_RECEIVE_ALLOW	2016-07-27 02:16:20 +02:00
reger	35a7d57260	update lucenematchversion to current (5.2.0 -> 5.5.0) there should be no need for reindex by the update	2016-07-23 18:36:43 +02:00
Michael Peter Christen	7466d390b2	small refactoring + do not accept too old peers during bootstrap	2016-07-04 11:02:15 +02:00
reger	8d58a48029	remove wrong log line in CrawlSwitchboard + don't allow CrawlSwitchboard to exit application making network param unused	2016-07-02 20:33:23 +02:00
reger	b119ff65be	clean out not used Switchboard variables counter indexedPages, const xstackCrawlSlots	2016-06-14 01:50:32 +02:00
reger	bd8f7c11f5	Use transparent addToCrawler in AutoSearch instead of addToIndex This would likely also be of advantage for RSS import/schedule as following bug-reports suggest http://mantis.tokeek.de/view.php?id=569 http://mantis.tokeek.de/view.php?id=655	2016-06-01 01:14:22 +02:00
JeremyRand	433217b33e	Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.)	2016-05-20 20:17:51 -05:00
reger	d0a571bed2	del cytag trail for own index.html (save resource not used by default)	2016-05-19 01:59:00 +02:00
reger	7097dcbdbd	cleanup hack for partial Solr update on multivalued datefields has been fixed in Solr http://issues.apache.org/jira/browse/SOLR-8050	2016-05-06 02:47:04 +02:00
reger	f10ea3c155	clean-out unused SwitchboardConstants	2016-05-05 00:55:22 +02:00
reger	ef24593347	delete obsolete SEARCHRESULT busythread constants not used since 29.05.2013 18:27:27 `0c1a018bbd`	2016-05-04 01:30:10 +02:00
reger	6ecc180299	fix rwi doubledom return best (highest) ranking	2016-04-12 03:55:43 +02:00
reger	d9adc2c255	load handler for Transparent Proxy on startup only if feature is activated to save the resources and keep handler chain small if the feature is not used. +add a warning message on settingsack_p page to restart on first activation	2016-03-25 05:26:48 +01:00
Michael Peter Christen	b89465d952	0N - basic dump upload servlet infrastructure, to share index dumps within an experimental new sharing model	2016-03-11 18:12:13 +01:00
Michael Peter Christen	849ab671a9	0n: modified the p2p bootstraping process - rules had been too tight and did not support the re-start of a network with just one principal peer.	2016-03-11 08:54:42 +01:00
Michael Peter Christen	a6bf0b1649	0N - added option to generate index export files for a specific number of minutes in the past and reverted latest change. The export file dump will now contain four data elements: f - first date of index entry write date, l - last date of index write date, n - now-date of index dump time, c - count of numbers inside the dump. '0N' denotes a series of changes which will lead to the opportunity to exchange index data dumps in a way that is needed to integrate ZeroNet index data. This will be based on index dump sharing; that causes this commit.	2016-02-23 18:56:20 +01:00
reger	06d0e2aeb9	result heuristic (also used in greedy learning mode) to use outbound links if result is full index doc. Otherwise use default loader methode. - Above brought up that parser start url parameter, declared as AnchorURL uses only methodes of parent object DigestURL (changed parameter declaration accordingly).	2016-02-16 02:05:58 +01:00
reger	caf9e98f09	put metadata dc_publisher in corresponding schema field	2016-02-14 21:13:25 +01:00
reger	6f0b073bf3	override detected language (statistic langdetect) only with TLD determided language if langdetect probability is not high. + additionally truncate zh-cn / zh-tw returned by langdetect to 2 char ISO639-1 zh used by YaCy	2016-02-07 21:16:22 +01:00
reger	535d4bf75f	respect hidden attribute for file and smb directory listing (hidden directories are not listed, effects crawling of local file system)	2016-02-04 19:16:00 +01:00
reger	a6617ad887	expand initRemoteCrawler() to terminate worker threads if called to deactivate remote crawl. On startup we save the resources for remote crawler if disabled. Once started threads are running idle after disable remote crawl. Now threads are terminated to save the resources also while disabeling during runtime. + remove empty class Channels	2016-01-28 23:14:09 +01:00
reger	ed3e16e092	apply remote result count config value to Bookmark Autosearch + prepare to make the widely unused Bookmark feature optional	2016-01-15 02:10:10 +01:00
Ryszard Goń	a98c395023	Add the Autocrawl thread	2016-01-14 00:50:23 +01:00
Ryszard Goń	1728cd30c6	Create autocrawl profiles	2016-01-12 16:28:34 +01:00
luc	571bc55937	Refactoring : use StandardCharsets constants instead of hard-coded charset names.	2016-01-05 23:37:05 +01:00
reger	1af0e9ef74	remove workaround for Solr bug regarding multivalued date fields fixed in 5.4.0 http://issues.apache.org/jira/browse/SOLR-8050	2016-01-03 01:11:27 +01:00
reger	a58d34a4e8	check error URL cache before adding errorDoc to index - del obsolete related switchboardconstant	2016-01-02 05:03:57 +01:00
reger	cd26717ba2	fix low memory status hint (dht-in disabled) http://mantis.tokeek.de/view.php?id=619	2015-12-29 20:38:45 +01:00

1 2 3 4 5 ...

1231 Commits