Commit Graph

13674 Commits

Author SHA1 Message Date
luccioman
e97580dfc7 Fixed unsafe conccurent access to generic SimpleDateFormat instances
SimpleDateFormat must not be used by concurrent threads without
synchronization for parsing or formating dates as it is not thread-safe
(internally holds a calendar instance that is not synchronized).

Prefer now DateTimeFormatter when possible as it is thread-safe without
concurrent access performance bottleneck (does not internally use
synchronization locks).
2018-06-28 14:59:23 +02:00
luccioman
38a3a5e5ad Fixed a NullPointerException case in the suggest api 2018-06-22 10:49:01 +02:00
luccioman
8811700e2e Upgraded Jetty dependency from 9.4.9 to 9.4.11 2018-06-20 09:33:26 +02:00
luccioman
d53c33e4ef Fixed potential infinite loop case (does not occur in current code base) 2018-06-20 07:51:59 +02:00
luccioman
a15ac8e0ca Made CrawlProfile loading tolerant to malformed json string attribute 2018-06-19 12:53:17 +02:00
luccioman
a715bb7876 Fixed rendering of solr mustNoMatch value on CrawlProfileEditor_p.xml 2018-06-19 12:50:28 +02:00
luccioman
0b302c5004 Do not block whole server startup on persisted crawl profile load error 2018-06-19 12:48:17 +02:00
luccioman
b159564c72 Properly render json string attributes in the crawl profile html editor 2018-06-19 12:46:50 +02:00
luccioman
4d9aa4ed1e Fixed default crawl profile solr mustnotmatch query from previous commit 2018-06-19 11:58:47 +02:00
luccioman
cced94298a Added a new crawler document filter type using Solr syntax
This makes possbile to set up much more advanced document crawl filters,
by filtering on one or more document indexed fields before inserting in
the index.
2018-06-19 10:12:20 +02:00
luccioman
2c155ece77 Fixed JUnit test after removal of unused Transformer 2018-06-19 07:07:18 +02:00
Michael Christen
e0dc632020 removed transformer
it was not used any more
2018-06-19 00:42:23 +02:00
luccioman
495ca57f61 Additional minor fix in Italian translation 2018-06-12 14:29:18 +02:00
luccioman
eb94986f95 Added Italian in available web interface languages list 2018-06-12 14:19:22 +02:00
luccioman
378fe3f079 Fixed various minor mistakes in italian translation 2018-06-12 14:18:29 +02:00
luccioman
6df1e543f3
Merge pull request #183 from SebastianoPistore/master
Added Italian translation.
2018-06-12 14:15:29 +02:00
luccioman
9bc7b6c39d Allow edtion of scheduled next execution dates for finer control
Can be useful more especially when scheduling many API calls over a long
period of time to precisely adjust each scheduled date/time.
2018-06-11 11:38:58 +02:00
Sebastiano Pistore
ecccc44865 Added Italian translation. 2018-06-10 15:11:50 +02:00
luccioman
40e8c7b89b Use the heavy ConcurrentUpdateSolrClient only when necessary
Prefer the lightweight HttpSolrClient when no updates are performed on
the remote Solr instance, as recommended by Solr documentation itself.
2018-06-08 11:18:29 +02:00
luccioman
bd4cfeda3f Add a max acceptable limit to the size of Solr responses on p2p search
Following activation of gzip compression on responses, to ensure
uncompressed content can fit on available memory.
2018-06-08 10:33:23 +02:00
luccioman
de4ea95687 Consistently allow gzip compression of remote Solr responses
Was already enabled when requesting remote Solr with https or with
authentication (as an external Solr index)
2018-06-07 15:20:37 +02:00
luccioman
cea8187161 Reuse expired connections evictors threads provided by apache and solr 2018-06-06 14:24:05 +02:00
luccioman
b5dc1f376f Made outgoing pools max total connections user configurable
For a finer control over the maximum simultaneously active outgoing
connections.
2018-06-06 09:36:50 +02:00
luccioman
387d646c0e Added gzip compression of responses returned to user-agents accepting it
Enabled as default, but can be disabled using the "Server Access
Settings" admin page.
2018-06-05 13:35:39 +02:00
luccioman
a7a4ba3287 Apply remote solr configured timeout on getting connection from pool 2018-06-02 17:38:14 +02:00
luccioman
a1990202ab Fixed unresolve-pattern case on old html title 2018-06-02 14:54:05 +02:00
luccioman
ee6670fb8f Use a common pooled http connection manager for remote solr instances
For a better control on the maximum simultaneous outgoing http
connections, as already done for any other http connections (crawls, rwi
search, p2p protocol) using the net.yacy.cora.protocol.http.HTTPClient
2018-05-29 09:24:21 +02:00
luccioman
d28f9ba0f6 Removed use of deprecated ConcurrentUpdateSolrClient constructor 2018-05-26 21:00:24 +02:00
luccioman
8a749aa5ad Trace level log message for monitoring remote solr response times 2018-05-26 20:58:05 +02:00
luccioman
35826a3091 Added a search page customization setting to display or not favicons
If not interested in displaying this on your search results and notably
on a peer with limited resources this can help saving some CPU and
outgoing network connections.
2018-05-25 11:13:43 +02:00
luccioman
0082b5ab2a Added missing default Solr http client connection timeout initialization
Consistently with the custom Solr http client used for https connections
to remote Solr peers or to YaCy external Solr storage.

This prevent remote Solr requests threads to wait for establishing a
connection to a remote peer longer than the configured timeout.
2018-05-24 09:24:52 +02:00
luccioman
fa4399d5d2 Small perf improvement : initialize threads names early when possible
Initializing Thread names using the Thread constructor parameter is
faster as it already sets a thread name even if no customized one is
given, while an additional call to the Thread.setName() function
internally do synchronized access, eventually runs access check on the
security manager and performs a native call.

Profiling a running YaCy server revealed that the total processing time
spent on Thread.setName() for a typical p2p search was in the range of
seconds.
2018-05-23 14:45:35 +02:00
luccioman
79bd9f623a Updated YaCy home page embedded links from http to https scheme 2018-05-22 17:46:12 +02:00
luccioman
1dfd3e9dde Limit the rate of calls to the suggest API when typing in search field 2018-05-22 07:55:09 +02:00
luccioman
84d82bfdd7 Adjusted suggestions timeout management
* less CPU usage using the Solr 'allowedTime' parameter
* increase chances to get some results even when a first operation step
goes in time out by letting some time for final snippets results
processing
2018-05-21 14:49:43 +02:00
reger
d5af160e60 upd to slf4j-1.7.25 2018-05-20 21:51:41 +02:00
luccioman
65854bcb22 Fixed NullPointerException when omitHeader=true on external Solr server 2018-05-18 11:30:14 +02:00
luccioman
c4d984cec8 Fixed Solr response header duplication when requesting external Solr 2018-05-18 11:28:30 +02:00
luccioman
124cc24aa3 Properly handle embedded Solr partial results
Solr can provide partial results for example when a processing time
limit (specified with the parameter `timeAllowed`) is exceeded.

Before this fix, getting partial results from an embedded Solr index
resulted in a ClassCastException :
"org.apache.solr.common.SolrDocumentList cannot be cast to
org.apache.solr.response.ResultContext".
2018-05-18 10:14:54 +02:00
luccioman
3ce44cf250 Fixed largest snippet get : don't reject ones starting with a space char 2018-05-14 18:26:25 +02:00
luccioman
f511e16d50 Prevent duplication of Solr query highlight fields parameters
That was caused by concurrent modifications (with addHighlightField()
function) to the same SolrQuery instance when requesting Solr on remote
peers in p2p search.
2018-05-14 15:26:44 +02:00
luccioman
4f0ab318ef Fixed snippets statistics displayed "provided by Solr" count 2018-05-14 15:21:21 +02:00
luccioman
e357ade47d Reduced memory footprint of text snippet extraction
By not parsing and storing at first all sentences of a document, but
only on the fly the ones necessary to compute the snippet.
2018-05-13 10:29:52 +02:00
luccioman
e115e57cc7 Reduced text snippet extraction processing time.
By not generating MD5 hashes on all words of indexed texts, processing
time is reduced by 30 to 50% on indexed documents with more than 1Mbytes
of plain text.
2018-05-11 15:42:53 +02:00
reger
7525594315 upd to jwat-warc-1.1.1 2018-05-06 00:49:30 +02:00
luccioman
1e2f094b9e Removed unnecessary html end ligne tag with invalid syntax 2018-05-03 09:00:09 +02:00
luccioman
ce289ebaf7 Upgraded ConfigNetwork_p html doctype and added language attribute 2018-05-03 08:53:07 +02:00
luccioman
16254fac1e Removed unpaired select closing tag 2018-05-03 08:37:38 +02:00
luccioman
f1061e0897 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2018-05-02 08:40:19 +02:00
luccioman
692c1cfdde Added a UI section to configure encryption of peers communications 2018-05-02 08:38:58 +02:00