yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
Michael Peter Christen	1052263af3	- added a new solr field references_i which stores the number of INCOMING links to the corresponding web page. This information is taken from the reverse link index (a 'little sister' of the RWI index). - this field can be of use to enhance the ranking because a web page with more incoming links can be more more important than others. But this is not true for typical link pages like menues. Therefore the number of outgoing links is needed. - added a new solr attribute 'bf' to solr queries which is a boost function extension. this field can contain a formula which comuptes the boost according to given field values. After some experiments the following forumla is now default: div(add(1,references_i),pow(add(1,inboundlinkscount_i),1.6))^0.4 This takes the number of references and the inbound links. Further experiments are needed to enhance that forumula.	2012-12-18 14:42:35 +01:00
Michael Peter Christen	34f8786508	removed dependency of vocabulary navigation from Jena and it's triplestore; the vocabulary search is now done using generic solr fields which are created on-the-fly during runtime.	2012-12-18 02:29:03 +01:00
reger	664499bb10	PerformanceQueues: disable input for hardcoded httpd performance values	2012-12-16 21:01:13 +01:00
Michael Peter Christen	9319b90d8a	- fixes for host navigation - fixes for filetype navigation - removed unused code	2012-12-15 09:14:49 +01:00
Michael Peter Christen	cb5cbec14d	distinguishing modified query string and original query string	2012-12-15 00:05:46 +01:00
Michael Peter Christen	fb0fa9a102	- fixed 'delete from subpath' during crawl start which deleted nothing; now works; - changed some crawl start html design details	2012-12-11 13:38:28 +01:00
orbiter	54e193a2b8	you can now search for '*' to get just ALL entries in the search index as result list. This makes sense if you intend to search just by using the navigation tools to cut the data set into navigation 'slices'.	2012-12-10 21:00:30 +01:00
orbiter	7f5526e6ef	allow larger no-proxy expressions	2012-12-10 20:59:43 +01:00
reger	e80dfeca23	- making blacklist path part case insensitive (solving http://bugs.yacy.net/view.php?id=171 ) - blacklist test adding explicite response text "not blocked" if no blacklist match	2012-12-08 06:34:48 +01:00
Michael Peter Christen	4491072256	- clear the search cache when altering the solr boosts - better positions for submit buttons	2012-12-07 14:56:34 +01:00
Michael Peter Christen	2b7d46bc1f	using a filter query for the site parameter in GSA api	2012-12-07 14:54:49 +01:00
Michael Peter Christen	10527e28ae	fix for wrong display of error urls in HostBrowser	2012-12-07 00:31:10 +01:00
Michael Peter Christen	5f5d66921e	patch for funny symbols in url paths (like tilde)	2012-12-05 22:05:49 +01:00
Michael Peter Christen	8aa08261a7	update to Solr Boost handling	2012-12-05 12:26:42 +01:00
Michael Peter Christen	908ad2f174	Added a new servlet to configure the solr ranking using field boosts	2012-12-03 17:01:19 +01:00
Michael Peter Christen	a598fb6227	renamed Ranking_p.html to RankingRWI_p.html because there will be another Ranking servlet as well at next	2012-12-03 00:01:41 +01:00
Michael Peter Christen	72f165d58b	added a Boost class which stores solr query boost values. The class can be configured using the yacy.init file. The boost information is taken from the configuration each time when a query to solr is done.	2012-12-02 16:54:29 +01:00
reger	bb20691d4f	fix: respect config setting of "show Nav Top-Menu" in HostBrowser.html for public users (as hostbrowser is now available in search results)	2012-12-01 01:14:29 +01:00
Michael Peter Christen	3de784c8dd	replaced more split and replaceAll missing pattern pre-compilation with pre-compiled pattern	2012-11-26 13:40:53 +01:00
Michael Peter Christen	8fc3679c66	using more pre-compile pattern for split methods	2012-11-26 13:11:55 +01:00
Michael Peter Christen	d48e9788d2	enhanced search result processing behavior - query less at one time; query more often - in between the small queries, evaluate results - remove fields from search results which are not needed	2012-11-26 12:24:35 +01:00
Michael Peter Christen	eca68fa197	added debug code to crawler monitor	2012-11-25 15:43:42 +01:00
Michael Peter Christen	205f8b222b	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2012-11-25 14:41:49 +01:00
orbiter	c54cb85422	added link to http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html to the /RegexTest.html servlet	2012-11-25 12:20:41 +01:00
Michael Peter Christen	b7004043ea	- added a field cache for solr queries which call only for a single value - fixed a version conflict exception within a solr add request	2012-11-24 22:30:05 +01:00
Michael Peter Christen	bf42179982	introduced more structure in HostBrowser, table view, better counting, distinguishing of error cases (fail/excluded)	2012-11-23 14:09:48 +01:00
Michael Peter Christen	4eab3aae60	removed overhead by preventing generation of full search results when only the url is requested	2012-11-23 01:35:28 +01:00
Michael Peter Christen	a114bb23bb	- using edismax in gsa interface - generating less field data for gsa search results - using a boost query in gsa interface to move double content to the end of the result list	2012-11-22 13:03:33 +01:00
Michael Peter Christen	d6b82840f8	added a feature to find similarities in documents. This uses an enhanced version of the Nutch/Solr TextProfileSignatue. As a result, a signature of the document is written to the solr search index. Additionally for each time when a signature is written, it is checked if the singature exists already in the index. If the signature does not exist, the document is marked as unique. The unique attribute can now be used to sort document lists and bring duplicates to the end of a result list. To enable this, a large portion of the search api to Solr had to be changed. This affected mainly caching of 'exists' searches to enhance the check for existing signatures and do this without actually doing a solr query. Because here the first time a long number is used as value in the Solr store, also the value naming in the YaCySchema had to be adopted and normalized. This caused that many files had to be changed.	2012-11-21 18:46:49 +01:00
Michael Peter Christen	f5ca5cea44	- added field options to all solr queries. This can be used to restrict the actual data which is fetched from solr. - used the new field options to reduce generic options like getting the load date or the count of search results. should increase overall speed - used the new field options to reduce overhead in the host browser during aquisition of links. - used the field options to make checking of links in crawler faster - if the crawler is paused, the crawl queue is not cleaned	2012-11-19 17:24:34 +01:00
Michael Peter Christen	46be4af5b9	Merge commit '2bb8f045cc92f31fc7e720cc30b38af417563890'	2012-11-18 22:11:04 +01:00
Michael Peter Christen	952e143580	FINALLY YaCy can now search for full strings using double- or singlequoted strings in the search query line!!!	2012-11-18 16:03:34 +01:00
orbiter	5dfd6359cb	redesign of the QueryParams class: introduced QueryGoal which holds the query string parser. This shall be used to create a proper full-string matching which is handled then by QueryGoal.	2012-11-18 01:22:41 +01:00
Michael Peter Christen	5fd3b93661	added deletion of hosts during crawl start if deleteold option was given	2012-11-13 16:54:28 +01:00
Michael Peter Christen	d64445c3cb	because we have the inurl:<term> - searchmodifier, we don't actually need regular expressions as search attributes. They had now been removed from the advanced search page while they are still created internally. The filter is then expressed against solr as regular expression filter query. If the expression points out a selection of an specific protocol, host or filetype this is then translated into a facetted query.	2012-11-13 11:45:56 +01:00
orbiter	b55ea2197f	- redesign of crawl start servlet - for domain-limited crawls, the domain is deleted now by default before the crawl is started	2012-11-13 10:54:21 +01:00
orbiter	1c66de4bd4	- removed scheduled crawling options in crawl start because it is superfluous there; it can be changed in the scheduler servlet. It's also confusing in the presence of the delete-option, which will be implemented next. - removed unused crawl start servlet - some refactoring to make the time parser reusable	2012-11-12 11:19:39 +01:00
Michael Peter Christen	2e7219f9fd	removed hightlighting of search results within collections in GSA interface	2012-11-09 16:25:24 +01:00
Michael Peter Christen	074dfd297b	added icons and a selection for hosts with urls pending for crawler or with errors	2012-11-09 16:24:56 +01:00
cominch	21df1ad9e0	update and generalization of the SMW import and content control routines	2012-11-09 13:48:40 +01:00
Michael Peter Christen	4c4e0eece2	added new submenu 'Target Analysis' with three servlets which are useful to analyse the target servers: robots.txt table, mass target analysis and a regex tester	2012-11-07 21:26:01 +01:00
Michael Peter Christen	61995d508e	do the commit anyway before calling a search interface	2012-11-07 17:27:50 +01:00
Michael Peter Christen	86ec199126	using a better file name	2012-11-07 16:39:49 +01:00
Michael Peter Christen	5105256927	update to search result logging (this was a remaining issue from the solr 4.0.0 migration)	2012-11-07 14:15:27 +01:00
Michael Peter Christen	570e42c4e3	fix for filetype naviagtor	2012-11-07 13:53:29 +01:00
Michael Peter Christen	71ed8e5e07	bugfixes for crawler	2012-11-07 12:52:19 +01:00
Michael Peter Christen	29fbbb49dc	better colors for host browser and corrected document count	2012-11-07 12:23:21 +01:00
Michael Peter Christen	6244b084cd	fixed wrong order of result count values	2012-11-07 02:29:33 +01:00
Michael Peter Christen	631b08e7e2	update to HostBrowser	2012-11-07 02:17:24 +01:00
Michael Peter Christen	51f420e4f5	removed location search because it is only working in special cases	2012-11-07 02:04:41 +01:00

1 2 3 4 5 ...

4135 Commits