Commit Graph

1395 Commits

Author SHA1 Message Date
luccioman
c35d0568b6 Support for preferred https in peers communication on more operations 2018-04-24 08:08:24 +02:00
luccioman
a3ec7a7a5f Added analysis optional setting to compute statistics on text snippets
Thus producing some basic stats on processing times for snippets
generation and counts on snippets per source type.
2018-04-15 09:55:08 +02:00
luccioman
0d34034f17 Ensure an embedded Solr is available for Solr dump/restore operations
Otherwise, these operations triggered NullPointerException when only an
external Solr index is attached.
2018-04-07 13:42:06 +02:00
luccioman
d92b191942 Ensure no remote Solr is attached before "Shut Down and Re-Start Solr"
Otherwise once this operation is applied, the remote Solr(s) instances
are deconnected and the embedded Solr is connected even if disabled by
setting "core.service.fulltext".

Also use constants for related default setting values.
2018-04-06 20:34:54 +02:00
luccioman
26d8ad591c Adjusted Solr select servlet output when using an external Solr only
- Use the EnhancedXMLResponseWriter only when requested output is "exml"
- Use the Standard Solr writers when possible, for example for json, xml
or javabin output formats
- Return an error when the requested format can not been rendered with
an external Solr server only

Important : this modification is necessary for peers using exclusively
an external Solr server to be reachable as robinson targets in p2p
search, as the binary format ("javabin") is the default Solr exchange
format for peers.
Before this, when a peer requested a remote one attached only to an
external Solr (no embedded one), it ended with "Invalid type" error, as
the remote peer answered with xml although binary format was requested.
2018-04-06 15:16:54 +02:00
luccioman
69690c13a0 Optionally allow external Solr server with self-signed certificate
This is necessary when you want to attach to a dedicated external Solr
server protected with basic http authentication and requested over https
but having only a self-signed certificate.
2018-04-04 18:16:26 +02:00
luccioman
2fd4d05e2f Added a shared Java constant for setting key server.servlets.called 2018-04-02 15:16:10 +02:00
luccioman
ba9cd14516 Removed hard-coded patch for Solr 5.0 on ranking boost function
The current default boost function
(`recip(ms(NOW,last_modified),3.16e-11,1,1)`) for the Date ranking
profile is indeed working fine.
What can trigger the error `unexpected docvalues type NUMERIC for field
'last_modified'` is the previous default boost function (quite old now)
or any custom one using the Solr `ord` or `rord` functions on the
last_modified field.
Then the problem was that the migration code in the Switchboard supposed
to detect the old date boost function was incorrect (one trailing right
parenthesis in excess), so the deprecated function remained.

This fixes issue #169.
2018-03-26 16:24:27 +02:00
luccioman
fb3032c530 Added a crawl filtering possibility on documents Media Type (MIME) 2018-03-23 10:28:19 +01:00
luccioman
c3ff50c17a Updated the list of audio file formats supported by the audioTagParser
Follows upgrade to Jaudiotagger dependency to version 2.2.5.
2018-02-27 18:04:12 +01:00
luccioman
1b90479a76 Added missing vocabulary navigator increment on results from RWI 2018-02-23 11:36:03 +01:00
luccioman
3a973dbb23 Removed unused import 2018-02-14 09:27:17 +01:00
reger
87077b8fb6 Adjust and move Language Navigator to be member of the navigatior plugin
list.
2018-02-12 00:16:34 +01:00
luccioman
0cdee4e26a Fixed loss of "meanCount" search param when using facets or page buttons
Then on new search queries, no suggestions at all could be displayed.
2018-02-08 08:07:30 +01:00
luccioman
117a859879 Do not clear all search modifiers when unselecting one modifier.
Previously, when clicking a selected facet in the search results page to
unselect it, all other eventually selected modifiers/facets were also
removed.
2018-02-07 15:54:46 +01:00
luccioman
33593c22e9 Fixed loss of other modifiers on keywords/tags search navigation links 2018-02-06 17:17:13 +01:00
luccioman
a9dc0874c0 Remove old query terms from search results suggestions links.
Especially when old terms were misspelled, suggestions links then
provided most of the time empty results.
2018-02-06 15:14:14 +01:00
luccioman
9412881230 Added basic support for autotagging microdata annotated item types.
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
2018-02-06 10:25:38 +01:00
luccioman
929e0d6eae Replaced improper ByteBuffer.equals() implementation by Arrays.equals()
Renamed also ByteBuffer.equals() to startsWith() as this is the
appropriate function implementation semantics.
2018-01-29 13:38:25 +01:00
luccioman
9ddf92d143 Removed unncessary reflection usage for workflow tasks.
This improves code readability and maintainability (calls hierarchy are
easier to read) and eventually performance.
2018-01-15 10:05:49 +01:00
luccioman
9624516bf8 Refresh recrawl job profile threshold date like other default profiles 2018-01-15 08:06:28 +01:00
luccioman
d47afe6fab Use a constant for crawler reject reason prefix with specific processing 2018-01-13 10:45:00 +01:00
luccioman
8a4ea1c11e Added UI switch to control content domain constraint per search request 2018-01-02 08:13:14 +01:00
reger
f8071ac8ae Make TokenizedStringNavigator (used for keyword search facet) active
check case insensitive.
As keywords are compared lower case, make sure user input keyword:Key
or keyword:key will be shown as active in facet entry key.
2017-12-28 02:51:52 +01:00
luccioman
e6907fdab3 Added optional search parameter/setting to control content domain filter
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
2017-12-23 18:56:17 +01:00
luccioman
09c4ee56a7 Added optional https support for remote crawl and profile operations 2017-12-21 18:41:32 +01:00
luccioman
5db1c9155a Do locale independant case conversion on hosts, schemes, and file exts.
Required for proper operation when the default system locale is Turkish,
as dottless and dotted i characters have specific case conversion rules
in this language.
2017-12-19 13:52:05 +01:00
luccioman
1c4803e40a Enable optional https support for /yacy/transferURL API calls.
Also updated some Javadoc and consistently use Switchboard instance as a
constructor parameter where relevant.
2017-12-19 12:30:49 +01:00
luccioman
17e004599d Started implementing optional https preference for protocol operations
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.
2017-12-15 11:28:46 +01:00
Michael Peter Christen
25573bd5ab added a crawl filter based on <div> tag class names
When a crawl is started, a new field to exclude content from scraping is
available. The field can be identified with the class name of div tags.
All text contained in such a div tag where the configured class name(s)
match are not indexed, while the remaining page is indexed.
2017-12-09 22:29:35 +01:00
luccioman
a4494d6e01 Improved support for internationalized domain names on "site:" modifier
Allow typing directly internationalized domain names including non ASCII
characters in the search field. 
Search is done using the ASCII Compatible Encoding (ACE) representation.
2017-12-04 18:23:26 +01:00
luccioman
d07006bac4 Do locale independant case conversion on "filetype:" query modifier. 2017-12-04 14:11:29 +01:00
luccioman
8fbf25d1ed Made "site:" query modifier case insensitive. 2017-12-04 14:08:34 +01:00
luccioman
867388e05b Refactored 'site:' query modifier parsing into a dedicated function. 2017-12-04 13:58:15 +01:00
luccioman
c9d80b5b77 Prefer fine URL match over approximate URL mask regex on final filtering
Also prevent adding a redundant and CPU costly Solr url mask filter
query when possible
2017-12-01 11:52:52 +01:00
luccioman
0a120787e3 Improved accuracy of URLs search filters : protocol, tld, host, file ext 2017-12-01 11:19:31 +01:00
luccioman
e07ef1b610 Apply tld query modifier on Solr host_s mandatory field.
The filter has thus much more chances to be effective than when applied
on the optional field host_dnc_s.
2017-12-01 08:46:46 +01:00
luccioman
478e92deff Fixed url mask filter generated when protocol modifier is not null 2017-11-30 20:21:45 +01:00
luccioman
29de4a65d7 Refactored url mask filter build from query modifiers
For better readability and easier unit testing.
2017-11-30 09:20:32 +01:00
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00
luccioman
46f37e38dc Customized Threads with generic name for easier monitoring. 2017-10-31 08:53:17 +01:00
luccioman
8e732d437c Enable HTTP Digest authentication for non admin users.
Also ensure authentication is not lost by Digest timeout when navigating
between index.html and search results page.

This way, running searches with extended features on a remote peer or a
password protected peer works with a regular user (with "Extended
search" rights). 
When authenticating on the search page with a user without "Extended
search" rights, it appears as authenticated, but has just its usual
access to the public search features.
2017-10-26 07:51:18 +02:00
luccioman
af198b990b Added an optional login link/status to the search public top nav bar.
Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.

Can be disabled using the ConfigSearchPage_p.html.
2017-10-21 10:57:36 +02:00
luccioman
8303e15419 Reduced number of search navigators refresh requests in JS resort mode
The SearchEvent listen to changes on each of its navigators, and the
information about their overall state is sent with each fetched search
item (as a "data-nav-generation" attribute). Then the browser can
regularly fetch a fresh version of yacysearchtrailer.html only if
necessary (when that nav-generation value change).
2017-10-12 07:16:19 +02:00
luccioman
dbff7b14fc Add a configurable limit to tags initially displayed in search results
When the limit is reached, a button allow expanding/collapsing remaining
tags.

When this feature is activated without a limit to the number of
displayed tags, when encountering search results with a very large
number of keywords, the results page can become almost unusable (very
long vertical scrollbar)
2017-10-09 14:13:46 +02:00
reger
c31d94664a Update deprecated SolrInputDocument.addField() with boost value
remove unused SchemaConfiguration.getDate (as it is designed to return
only past dates which might be unexpected for general configuration schema)
2017-10-06 20:32:28 +02:00
luccioman
27ab733685 Ensure private search features are not lost on Digest auth timeout
This is a fix for mantis 766 ( http://mantis.tokeek.de/view.php?id=766 )

Since the upgrade to Digest authentication, access to protected search
features was indeed disabled once the Digest nonce timed out.

After Digest auth timeout the browser no more sent authentication
information and as the search results page is not private, protected
features were simply be hidden without asking browser again for
authentication.

Adding a supplementary parameter when accessing the search results as
authenticated fixes this.
2017-09-29 19:18:12 +02:00
reger
ba60f65040 Adjust filetype: query modifier parameter to lower case
to prevent mismatch on user input with mixed case
Internally file extension are always compared lowercase.
2017-09-29 00:26:30 +02:00
luccioman
ef8aea7f8d Made the dates navigator max elements number user configurable.
Also used object properties on QueryParams instances, rather than using
mutable class (static) properties.
2017-09-25 09:19:08 +02:00
luccioman
9e86d183b8 Disable manual search results resorting when resorting is done with JS
Also added a constant for the js resorting setting key.
2017-09-13 07:58:05 +02:00