yacy_search_server

mirror of https://github.com/yacy/yacy_search_server.git synced 2024-09-21 00:00:13 +02:00

Author	SHA1	Message	Date
orbiter	4de3fefdb5	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-12-15 19:13:00 +01:00
orbiter	7e346e1d79	using stringbuilder in query construction	2013-12-15 19:12:49 +01:00
Michael Peter Christen	2702d9e56b	- added a SolrQueryResponse2SolrDocumentList method which is able to work around the unfolding process in Solr's BinaryResponseWriter. This was a huge performance bottleneck in the embedded solr connector and the problem is actually on Solr side, but we have now a workaround. - This made it possible to abstract a high-performance index access method which is implemented as method getDocumentListByParams. That method is also implemented in the SolrServerConnector and provides a very efficient access to a solr index if the index is embedded. - a popular use of the document list retrieval is a result count which can now also make use of the new method, via getDocumentCountByParams. - enhanced the Error cache which now does not store error documents within the ram cache if the document is also written to solr. When documents are retrieved from the cache, they are partly read from the ram cache and if not existent there, from the Solr index.	2013-12-13 15:56:29 +01:00
Michael Peter Christen	34633044b4	made pattern computation static	2013-12-12 10:55:36 +01:00
Michael Peter Christen	ef7ddbc933	added date parser caches to prevent re-calculation of costly date parsing	2013-12-12 10:55:12 +01:00
Michael Peter Christen	303f5694ba	avoid usage of existsByQuery. If a document can be loaded by the ID before testing other fields from the existsByQuery request, then a document cache fills and queries after that one can be avoided.	2013-12-12 03:36:30 +01:00
Michael Peter Christen	79771c60c0	IPv6 fixes	2013-12-06 14:30:08 +01:00
Michael Peter Christen	0db8e34625	enhanced webgraph processing	2013-12-04 01:54:45 +01:00
sixcooler	2c2ebb0d92	tried some hardening in order not letting any Solr-Searchers open	2013-11-29 02:40:12 +01:00
Michael Peter Christen	a16534cb0a	tried to fix timeout and connection-lost problems when using an outside solr.	2013-11-28 01:31:53 +01:00
Michael Peter Christen	9932c441c8	fixed a problem with Date fields parsing Solr results if a remote Solr is attached.	2013-11-28 00:54:53 +01:00
sixcooler	94db054aff	memory-leak-fix: the DocListSearcher fires an query in its constructor and it is highly recommend to close every SolrRequest. Every Request, which is not closed leaves a Searcher with its Chaches an can not be garbage-collectet.	2013-11-27 19:07:36 +01:00
Michael Peter Christen	5592ea57f0	hack to remove compiler warnings about deprecated classes. It would be better to remove the deprecated usage but to do this the Solr core must adopt the latest apache http core changes as well .. this is not our fault.	2013-11-25 23:30:35 +01:00
orbiter	037cd0a57c	using the BinaryResponseWriter which is supported within the YaCy solr servlet since YaCy 1.63. This is much more performant for the client than using the XMLResponseWriter because parsing of XML data is very CPU intensive. Older YaCy peers are still requested using the XMLResponseWriter but the majority of YaCy peers already respond with the binary writer. This makes remote searches much faster and less CPU intensive.	2013-11-25 21:31:40 +01:00
reger	8da75a4b0c	fix contentType definition for Solr html responswriter from xml to html (hint: value is currently not used, but is in SolrServlet)	2013-11-24 04:31:08 +01:00
Michael Peter Christen	1f0bfa8fec	added test to Base64Order (runs successfully!)	2013-11-22 10:38:42 +01:00
Michael Peter Christen	219d5934a4	fixed termination bug in Solr Connector	2013-11-16 08:22:29 +01:00
Michael Peter Christen	9d5895f643	enhanced and fixed postprocessing	2013-11-15 15:41:12 +01:00
Michael Peter Christen	f86fe90eda	enhanced mass storage speed to remote solr servers	2013-11-15 15:40:07 +01:00
Michael Peter Christen	6ed9821209	fixed several problems in solr connectors	2013-11-15 15:39:35 +01:00
Michael Peter Christen	191fd3d7e7	added an optimization option to HandleSet mass data storage structure	2013-11-15 15:38:00 +01:00
Michael Peter Christen	94b565ea0d	fixed keepalive min value	2013-11-15 15:37:01 +01:00
Michael Peter Christen	24a052ecb9	removed debug code for existsByIds	2013-11-13 13:41:18 +01:00
Michael Peter Christen	1a4a69c226	set more logger to 'final static'	2013-11-13 06:18:48 +01:00
orbiter	b085cb522b	replaced old existsByIds for embedded Solr with obviously much faster new selection method (including stil existing debug code to test that this is in fact better)	2013-11-11 11:25:01 +01:00
Michael Peter Christen	899e7e92b0	added debug code	2013-11-09 02:37:12 +01:00
Michael Peter Christen	81bb50118e	found and fixed a huge memory leak in solr caching (inside Solr). The not-flushed Solr cache is now handled in this way: - it is smaller by default - an Solr-internal process is started to flush the cache periodically (this does NOT clean the cache, just removes old objects) - a Solr-external process (the standard YaCy cleanup-process) now has direct access to the solr internal cache and flushes them completely. The time frame for such a flush is defined by the cleanup-process frequency, by default 10 minutes.	2013-11-07 10:01:44 +01:00
Michael Peter Christen	b2c329929f	Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git	2013-11-04 10:18:52 +01:00
Michael Peter Christen	60187a4ec2	fix in html parser	2013-11-04 10:16:20 +01:00
Michael Peter Christen	e1c1e57877	less overhead calling exist() with only one hash	2013-11-04 09:37:31 +01:00
reger	3d5d366f1c	fix html header in Solr HTMLResponseWriter - move 1st body content after </head> tag - add closing <span> tag	2013-11-04 03:12:02 +01:00
Michael Peter Christen	5a02d650ee	avoid cloning	2013-11-03 18:31:50 +01:00
Michael Peter Christen	cc39667399	Speed enhancements and less CPU usage during Solr searches when using the embedded Solr (the default). This was obtained by cirumventing solrj search encapsulation and the implementation of direct index access methods to Solr. The effect will not only be seen during search, but this has also a strong effect on suggestions (much more) and less CPU power usage during index distribution (which needs many search requests)	2013-11-01 17:24:36 +01:00
Michael Peter Christen	9bb7eab389	hacks to prevent storage of data longer than necessary during search and some speed enhancements. This should reduce the memory usage during heavy-load search a bit.	2013-10-25 15:05:30 +02:00
Michael Peter Christen	1a8783147b	enhanced computation of number of solr documents.	2013-10-24 15:48:05 +02:00
Michael Peter Christen	4948c39e48	added concurrency for mass crawl check	2013-10-23 11:27:19 +02:00
Michael Peter Christen	1b4fa2947d	- fixed a problem which ocurred when a document was not recognized with the right content domain (i.e. identifying that it is an image, text etc.) because it used the file extension and not an existing mime type assignment. - fixed the new setting that images shall be loaded for a better image search. - both fixes together makes it now possible to crawl commons.wikimedia.org which makes use of 'funny' document names (i.e. ending with .jpg while the document is html)	2013-10-23 00:16:54 +02:00
Michael Peter Christen	74d0256e93	enhanced postprocessing: fixed bugs, enable proper postprocessing also without the harvestingkey, remove crawl profiles after postprocessing, speed-up for clickdepth computation.	2013-10-16 11:27:06 +02:00
sixcooler	d9a02ed277	NPE fix for my last commit	2013-10-11 00:44:04 +02:00
sixcooler	61f627eb85	fix for ssl-connections from proxy-usage staying in close-wait-state + some extra 'close' in HttpClient	2013-10-10 20:57:37 +02:00
Michael Peter Christen	1b61bd40ed	- Added new solr field url_file_name_tokens_t which stores the file name tokens. This can be used to enhance the ranking. - Added also a rating_i field as basis for later usage. - enhanced the tokenization process.	2013-10-08 23:48:13 +02:00
sixcooler	d536092fe4	fix false fill NAME_CACHE_MISS-DNS-Cache in case of a timeout for eg. caused by massive requests when crawl from file	2013-10-08 18:02:42 +02:00
Michael Peter Christen	ef31d0f279	fix for rss reader, see http://bugs.yacy.net/view.php?id=294	2013-10-07 12:59:54 +02:00
Michael Peter Christen	b28d43decc	added two more fields source_cr_host_norm_i,target_cr_host_norm_i in webgraph and an addition to postprocessing to copy all cr ranking attributes to the link edges associated to the postprocessing documents	2013-09-27 16:57:05 +02:00
Michael Peter Christen	4476dea5ba	do not fail if a wrong boost key is used; instead, print only a warning See also: http://bugs.yacy.net/view.php?id=293	2013-09-27 12:28:09 +02:00
Michael Peter Christen	1b3d26dd23	hack to remove most of the warning: deprecated messages (but not all, one is left)	2013-09-25 21:14:52 +02:00
sixcooler	3c48fc65fd	reverted RemoteInstance to deprecated methods of httpClient-4.2 this should work with current remote-Solr-Instances	2013-09-25 18:45:16 +02:00
sixcooler	0cae420d8e	some dns-timing changes: since httpclient uses the domain-cache it is useful not to clean the domain cache until crawling is running (domains are filled into this cache) On huge crawl-starts (eg. from file) my DNS did not follow the high rates - so I reduced the rate and give some more time(-out)	2013-09-25 15:01:28 +02:00
sixcooler	15b1bb2513	bump to httpClient-4.3	2013-09-25 14:48:37 +02:00
orbiter	d86d2be5c3	automatically removed Places autotagging if no location library is wanted	2013-09-24 11:23:45 +02:00

1 2 3 4 5 ...

741 Commits