yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-19 00:01:41 +02:00

Author	SHA1	Message	Date
reger	a4629ad83b	upd pom	2015-02-28 19:48:29 +01:00
reger	d7259419f3	postpone raw snippet html encoding upon use instead of during init of snippet adressing http://mantis.tokeek.de/view.php?id=551	2015-02-28 19:02:18 +01:00
Michael Peter Christen	c3aadcf899	Fix for Jetty "JetLeak" bug: update to jetty 9.2.9 The bug was inside the jetty library, for details see: http://blog.gdssecurity.com/labs/2015/2/25/jetleak-vulnerability-remote-leakage-of-shared-buffers-in-je.html We recommend to update your YaCy peer with this bugfix.	2015-02-28 15:46:46 +01:00
reger	de56d934b2	apply query parameter getQueryFields() to GSA servlet	2015-02-27 00:53:20 +01:00
Marc Nause	d23f7165ab	Next try to fix start script for OpenBSD.	2015-02-25 21:11:59 +01:00
reger	2d2299f484	fix mimetype of rss items in rss parser - remove self reference as anchor for items	2015-02-25 01:58:42 +01:00
Michael Peter Christen	b432049d59	enhanced date parsing time	2015-02-25 01:05:46 +01:00
reger	9b0de2de64	introduce getQueryFields to return default query fields (queryparamter QF) calculated from boostfields config, making sure title, description, keywords and content is always searched. - apply change to solrServlet makes sure every remote query uses at least all locally defined boost fields for search - apply to local solr search - simplify select query by using QF defaults	2015-02-23 23:12:07 +01:00
Marc Nause	53e4ae65d0	Changes to improve compatibility with OpenBSD. (see http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5503)	2015-02-23 22:54:49 +01:00
reger	ba276d3e64	add description_txt to default query fields, Dublin Core Metadata field extracted by most parsers.	2015-02-22 05:42:04 +01:00
reger	a0f04db9ea	add extracted description/subject to pptParser	2015-02-22 05:31:56 +01:00
reger	8ec1db76ee	url unescape add check for inconsistent utf8 multibyte parsing If the url contains special chars (like umlaute äöü) it's interpreted as multybyte char and actually not converted at all (removed). Added a check if the multibyte convesion is not complete, just add the char as is. This fixes http://mantis.tokeek.de/view.php?id=200	2015-02-20 02:21:04 +01:00
reger	4b97ddb9ec	stop sending crawl receipts if receiver got offline	2015-02-17 03:16:10 +01:00
reger	ad1596f9ac	upd lucene api doc link	2015-02-16 01:20:12 +01:00
reger	7e35518787	add extracted description/subject to docParser	2015-02-16 00:50:16 +01:00
reger	f0a5188e11	replace depreciated HTTPClient setStaleConnectionCheckEnabled with setValidateAfterInactivity()	2015-02-15 23:09:01 +01:00
reger	7b569d2dbe	replace depriciated HTTPClient ALLOW_ALL_HOSTNAME_VERIFIER with NoopHostnameVerifier()	2015-02-15 21:34:01 +01:00
reger	fba34e12ef	fix formatting issue if snippet contains html code replacement for reverted commit `61f42a7928`	2015-02-15 20:39:20 +01:00
reger	e48720a58c	fix NPE in snippet computation	2015-02-15 05:30:14 +01:00
reger	49281617d2	upd to commons-codec-1.10.jar, commons-compress-1.9.jar	2015-02-14 23:04:05 +01:00
reger	1196ff01c8	revert: formatting fix eats also up highlighting need other solution for snippets with unwanted html code	2015-02-14 02:43:05 +01:00
Michael Peter Christen	f989f955dc	fixed httpclient lib paths in ant build	2015-02-14 01:38:20 +01:00
reger	6dbc976d8b	upd to httpclient-4.4	2015-02-13 00:50:32 +01:00
reger	61f42a7928	fix formatting issue in search result display if description contains html code noticed e.g. for id=NmNdJ9uApLaQ http://hswong3i.net/blog/hswong3i/virtualmin-drupal-7-x-ubuntu-12-04-howto	2015-02-13 00:20:33 +01:00
reger	eda0aeaf26	allow/recognize host in file: protocol crawl target This is useful in intranet indexing while crawling a intranet file server accessed via hostname while e.g. under Windows mapped to different drive letters on individual clients. Here you can crawl e.g. file://fileserver/documents having a valid uri in that intranet environment (while e.g. P:/documents might be client dependant).	2015-02-11 23:26:39 +01:00
reger	77851fa53c	fix parser test cases (Vocabulary paramete)	2015-02-11 01:43:02 +01:00
reger	df83fcc4fc	disable optimistic GC assumption in StandardMemoryStrategy After several tests found that eom is not prevented. Major reason in testing was assumption future GC will free avg of last 5 GC. Disabeling this check improved eom exceptions. Added simplest testcase used for verification	2015-02-11 01:42:01 +01:00
Michael Peter Christen	8ff76f8682	the cleanup process experienced a 100% CPU load situation and the loop did not terminate: Occurrences: 100 at java.util.HashMap$KeyIterator.next(HashMap.java:956) at net.yacy.cora.protocol.ConnectionInfo.cleanup(ConnectionInfo.java:300) at net.yacy.cora.protocol.ConnectionInfo.cleanUp(ConnectionInfo.java:293) at net.yacy.search.Switchboard.cleanupJob(Switchboard.java:2212) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:105) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215) This tries to fix the problem; the problem should be monitored	2015-02-10 08:43:45 +01:00
Michael Peter Christen	1f5b5c0111	npe fix for latest scraper feature	2015-02-10 08:33:30 +01:00
Michael Peter Christen	ee97302a23	hack to make date detection faster (while it becomes a bit incomplete regarding language alternatives)	2015-02-09 18:46:06 +01:00
Michael Peter Christen	6578ff3ddb	enhanced suggest function	2015-02-09 18:45:07 +01:00
reger	fe6f5a395d	fix Umlaut handling in blekko heuristic search term http://mantis.tokeek.de/view.php?id=169 observation: blekko seams to block xxxbot agents (=0 results)	2015-02-08 23:40:33 +01:00
reger	ab98f69592	fix: searchoption hint for heuristic	2015-02-08 00:15:30 +01:00
reger	23924348e2	url with semicolon or comma handling in proxy request apply patch supplied with bugreport http://mantis.tokeek.de/view.php?id=540	2015-02-07 22:01:54 +01:00
sixcooler	b05a2fca1f	small correction for last commit	2015-02-07 13:47:15 +01:00
reger	8fa542a8e1	upd to Jetty 9.2.7	2015-02-07 00:44:09 +01:00
reger	9025fe3518	upd error message for proxy fix http://mantis.tokeek.de/view.php?id=539	2015-02-07 00:37:43 +01:00
Michael Peter Christen	974d58b01f	IPv6 Fix for push interface	2015-02-04 15:03:34 +01:00
Michael Peter Christen	fe50e5aef6	fix for failed selection of terms in faceted search with vocabularies	2015-02-04 11:55:27 +01:00
Michael Peter Christen	1309619a71	remove remote indexing option in crawl start if not in p2p mode	2015-02-04 11:37:07 +01:00
Michael Peter Christen	6324db1213	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2015-02-04 11:27:31 +01:00
reger	5cb05c3013	adjust table column width to not line wrap crawler traffic line	2015-02-04 03:51:34 +01:00
Michael Peter Christen	606d00c8f2	cloning a crawl now accepts the class name of vocabulary scapers	2015-02-04 01:50:35 +01:00
Michael Peter Christen	97ba5ddbb7	configuration option for maxload limit for remote search	2015-02-04 01:12:25 +01:00
reger	c454ef69c6	add shortMemory check to heuristic search and skip operation on shortMemory (no request to remote openserch systems)	2015-02-03 03:08:34 +01:00
reger	11b21308c0	fix: malformed filename in image search fix for http://mantis.tokeek.de/view.php?id=533	2015-02-01 05:35:09 +01:00
reger	9e1ec5fec4	refactor: just some more useages of constant for term ":[* TO *]"	2015-02-01 04:26:33 +01:00
reger	8c491f51a5	remove hardcoded initialization of language nav if not used	2015-02-01 00:29:28 +01:00
Marc Nause	a311c97c9b	Added & in start script for *NIX which was lost a few commits ago.	2015-01-30 21:17:23 +01:00
Michael Peter Christen	b5ac29c9a5	added a html field scraper which reads text from html entities of a given css class and extends a given vocabulary with a term consisting with the text content of the html class tag. Additionally, the term is included into the semantic facet of the document. This allows the creation of faceted search to documents without the pre-creation of vocabularies; instead, the vocabulary is created on-the-fly, possibly for use in other crawls. If any of the term scraping for a specific vocabulary is successful on a document, this vocabulary is excluded for auto-annotation on the page. To use this feature, do the following: - create a vocabulary on /Vocabulary_p.html (if not existent) - in /CrawlStartExpert.html you will now see the vocabularies as column in a table. The second column provides text fields where you can name the class of html entities where the literal of the corresponding vocabulary shall be scraped out - when doing a search, you will see the content of the scraped fields in a navigation facet for the given vocabulary	2015-01-30 13:20:56 +01:00

1 2 3 4 5 ...

11630 Commits