Commit Graph

592 Commits

Author SHA1 Message Date
luccioman
9782a98a9c Added the possibility to customize facets sort type and direction
Previously search navigators/facets elements were sorted only by counts.
Now from the ConfigSearchPage_p.html admin page, sort direction
(ascending/descending) and type (on counts or labels) can be customized
independently for each navigator.
2019-01-24 18:43:06 +01:00
luccioman
b726b2b532 Removed unnecessary '+' character URL decoding from search query
Manually replacing '+' character or "%20" by a space character in the
search query parameter was necessary in YaCy a long time ago to properly
decode application/x-www-form-urlencoded format (commit
9842fab6e4 in 2010).
Since the introduction of Jetty as the embedded HTTP server (commit
4b77733e59 in 2013), this is no more
necessary as Jetty internals already do this for us in
org.eclipse.jetty.util.UrlEncoded.decodeUtf8To().

So we can remove now this duplicated decoding as it prevents a proper
use of the '+' character in search requests, as reported in issue #216.
2018-08-20 08:10:39 +02:00
luccioman
07e8628853 Added HTML5 embedded audio for results playing on supporting browsers
Restricted to authenticated or localhost users only to prevent
redistribution license issues.
2018-02-23 11:41:50 +01:00
luccioman
0cdee4e26a Fixed loss of "meanCount" search param when using facets or page buttons
Then on new search queries, no suggestions at all could be displayed.
2018-02-08 08:07:30 +01:00
luccioman
a9dc0874c0 Remove old query terms from search results suggestions links.
Especially when old terms were misspelled, suggestions links then
provided most of the time empty results.
2018-02-06 15:14:14 +01:00
luccioman
c71b545235 Enable results suggestions (Did you Mean) even when RWI is not enabled.
RWI is no more necessary for suggestions processing since commit
c40ba51ca6.
Revealed by a question about spell check from ouahpiti on YaCy forum
(http://forum.yacy-websuche.de/viewtopic.php?f=23&t=6084 ).
2018-02-06 12:33:44 +01:00
luccioman
8a4ea1c11e Added UI switch to control content domain constraint per search request 2018-01-02 08:13:14 +01:00
luccioman
e6907fdab3 Added optional search parameter/setting to control content domain filter
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
2017-12-23 18:56:17 +01:00
luccioman
f9cba827c0 Made "tld:" modifier case insensitive and IDN complient.
Thus allowing typing internationalized top-level domains with non ASCII
characters as tld: modifier.
2017-12-04 19:13:16 +01:00
luccioman
8e732d437c Enable HTTP Digest authentication for non admin users.
Also ensure authentication is not lost by Digest timeout when navigating
between index.html and search results page.

This way, running searches with extended features on a remote peer or a
password protected peer works with a regular user (with "Extended
search" rights). 
When authenticating on the search page with a user without "Extended
search" rights, it appears as authenticated, but has just its usual
access to the public search features.
2017-10-26 07:51:18 +02:00
luccioman
d0bed78d02 Use the same top nav bar on index.html and search results.
Thus eventually including the same optional login link/status in the
search start page than in the results page, for the same convenient
login without the need to use the Administration section.
2017-10-24 09:34:03 +02:00
luccioman
af198b990b Added an optional login link/status to the search public top nav bar.
Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.

Can be disabled using the ConfigSearchPage_p.html.
2017-10-21 10:57:36 +02:00
luccioman
27ab733685 Ensure private search features are not lost on Digest auth timeout
This is a fix for mantis 766 ( http://mantis.tokeek.de/view.php?id=766 )

Since the upgrade to Digest authentication, access to protected search
features was indeed disabled once the Digest nonce timed out.

After Digest auth timeout the browser no more sent authentication
information and as the search results page is not private, protected
features were simply be hidden without asking browser again for
authentication.

Adding a supplementary parameter when accessing the search results as
authenticated fixes this.
2017-09-29 19:18:12 +02:00
luccioman
ef8aea7f8d Made the dates navigator max elements number user configurable.
Also used object properties on QueryParams instances, rather than using
mutable class (static) properties.
2017-09-25 09:19:08 +02:00
luccioman
9049a926a5 Restrict JS results resorting to authenticated users.
Until a more efficient DOM refresh model needing less XHR requests per
search is implemented.
2017-09-16 09:26:08 +02:00
luccioman
d00a35576c Apply JS resort only when currently relevant : p2p text search 2017-09-15 09:51:34 +02:00
luccioman
9e86d183b8 Disable manual search results resorting when resorting is done with JS
Also added a constant for the js resorting setting key.
2017-09-13 07:58:05 +02:00
JeremyRand
d37df75afa
(WIP) Optionally sort HTML search items via Javascript.
TODO: Expose a GUI setting for this.
2017-09-03 17:50:08 +00:00
luccioman
a28428047a Fixed count of filtered results from local solr.
Was inadequately modified in my previous related commits (making next
pages buttons unavailable in Search portal mode), as
SearchEvent.local_solr_available did not count the total filtered
results but only the ones within the currently fetched result page(s).
2017-08-31 11:24:59 +02:00
luccioman
30c2f50e0b Use final results counts in progress bar detailed statistics.
Using unfiltered detailed counts (local and remote entries found before
doubles detection and before applying query modifiers) was confusing and
inconsistent with the total count. It could let think more results are
to come in the next pages, without understanding why they are not
displayed.
2017-08-31 07:37:24 +02:00
luccioman
a1a0515312 Added a button to manually refresh sorting of p2p search results.
As a server-side oriented alternative to the JavaScript realtime
resorting feature proposed in PR #104.
The goal is the same as in this PR : having the possibility compensate
the network latency of various peers results fetching and obtain once
possible a consistently ranked result set.
2017-08-28 19:03:51 +02:00
luccioman
870a5eae26 Removed temporary test main method commited by mistake. 2017-01-22 12:19:43 +01:00
reger
396ed3c769 On negative result vote also delete document from fulltext index
(not only from dht)
2017-01-01 23:58:38 +01:00
luccioman
c25e48e969 Enabled displaying results after 14th page for local search queries.
Fixes issue #90 for local queries only: Stealth mode, Portal mode or
Intranet mode. 
For P2p mode, the issue would probably be difficult to solve with
reasonable performance. This is still to dig.

Also switched some InterreputedException catch log messages to warn
level as this is normal behavior when shutting down a peer.

Fixed yacysearch buttons navbar behavior to deal correctly with total
results count or offset over 1000. Also improved the buttons navbar to
be able to navigate over 10th page for local queries.
2016-12-20 14:52:33 +01:00
reger
395f2e8946 Make ServletRequest implement the standardized HttpServletRequest interface,
to make all readily available information from the original ServletRequest
available to YaCy servlets (without converting data to internal structures).
The implementation of the common interface allows easier integration of
YaCy servlets with the servlet standard (e.g. shared login service with
the servlet container etc.)
2016-11-14 01:37:16 +01:00
luccioman
6a0b218ae5 Absolute URLs in yacysearch.* : ensure no downgrade from https to http
Also removed unnecessary use of deprecated Seed.getIP().
2016-11-12 23:29:25 +01:00
luccioman
60df09fff9 Fixed some HTML validation errors : Illegal character in query
Now encode space characters in URLs query part.
2016-09-30 10:54:53 +02:00
reger
7bac756720 prevent dealing with -UNRESOLVED_PATTERN- eventID parameter in html includes
on first landing on search page
2016-07-01 00:02:10 +02:00
reger
8c9684cc45 optimize surftip data load,
double load (index, loader) not neccessary, getMetadata already suficient
+ lng file adjustments
2016-05-08 05:27:19 +02:00
reger
4765e374e6 altered clac. of search result items per page to display
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
  - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
  - otherwise requests are limited to 100 results per page ( = search option, index.html)
      (this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)

see http://mantis.tokeek.de/view.php?id=627
2016-01-13 01:30:49 +01:00
reger
abd8ecb503 remove contendom depending override of search result items per page
initially introduced e4570bffaf (diff-ae6c130fc11088c830b00ed9256ab56b)
(as one part of unexpected difference in actual vs requested results, partial bugfix for http://mantis.tokeek.de/view.php?id=627 )
2016-01-12 01:04:10 +01:00
reger
e8256bb3b1 remove blekko from opensearch config (not available)
see https://blekko.com/
http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633
2016-01-04 04:49:10 +01:00
reger
28b8bc290a fix use of NETWORK_SEARCHVERIFY for rwi verification
was not used to set the searchevent parameter (done in SearchEventCache.getEvent)
- remove unused corresponding QueryParams.filterfailurls param.
2015-12-13 20:01:49 +01:00
reger
020630efd8 remove unused network scanner parameter from queryparameter
Search event is not using networkscanner 
(removed filterscannerfail param always init to false)
2015-12-13 02:50:08 +01:00
reger
a60b1fb6c2 differentiate api call getLocalPort() from getConfigInt() 2015-10-31 23:09:03 +01:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
Michael Peter Christen
dbbad23e12 removed warnings 2015-08-03 05:37:34 +02:00
Michael Peter Christen
1fec7fb3c1 suppress access to solr when doing search suggestions in case that the
index has more than two million documents. This protects the index from
beeing flooded with search requests that cannot be resolved before the
real search query has to be computet.
2015-06-24 13:02:12 +02:00
reger
b47267b79c precaution against NPE on createorgetBookmark on search result 2015-05-07 03:25:19 +02:00
reger
8a5b8f8789 on bookmaring of search result, remember orig. query in separate bookmark property
(instead of using the description field)
- adjust display and autosearch
- don't overwrite existing bookmark but combine info
2015-05-03 02:31:50 +02:00
Michael Peter Christen
fed26f33a8 enhanced timezone managament for indexed data:
to support the new time parser and search functions in YaCy a high
precision detection of date and time on the day is necessary. That
requires that the time zone of the document content and the time zone of
the user, doing a search, is detected. The time zone of the search
request is done automatically using the browsers time zone offset which
is delivered to the search request automatically and invisible to the
user. The time zone for the content of web pages cannot be detected
automatically and must be an attribute of crawl starts. The advanced
crawl start now provides an input field to set the time zone in minutes
as an offset number. All parsers must get a time zone offset passed, so
this required the change of the parser java api. A lot of other changes
had been made which corrects the wrong handling of dates in YaCy which
was to add a correction based on the time zone of the server. Now no
correction is added and all dates in YaCy are UTC/GMT time zone, a
normalized time zone for all peers.
2015-04-15 13:17:23 +02:00
Michael Peter Christen
6578ff3ddb enhanced suggest function 2015-02-09 18:45:07 +01:00
Michael Peter Christen
efbc9a3561 introducting a new getConfig method which parses comma-separated llists
from setting fields; refactoring for all places where such lists are
parsed
2015-01-29 01:53:36 +01:00
Michael Peter Christen
69eacdf4eb applying precompiled CommonPattern.COMMA.split to all places where
split(",") was used
2015-01-29 01:46:22 +01:00
Michael Peter Christen
3d717b749a fix for urlmaskfilter 2015-01-28 13:40:41 +01:00
reger
24f68a4eb7 refactor opensearch heuristic
introduce FederateSearchManager handling search heuristic to external systems via specific FederateSearchConnectors,
which provide the query() functionallity, the translation to YaCy schema .toYaCySchema() and the search() routine to deliver results to searchevents, which is generally implemented in Abstract connector.
The manager enforces now a min 15s delay between calls to external systems.
Besides the OpensearchConnector a SolrFederateSearchConnector is available. It uses a additional config file for fieldname translation.

default heuristicopensearch.conf: 
- openbdb.com removed - seems not longer to deliver results
- config via solrconnector to  datacite.org added (large technical library archive)
2015-01-19 03:30:35 +01:00
reger
bb37cb32e4 Add title import for bookmark icon
if avail in index
2015-01-09 01:33:45 +01:00
reger
ebe5faeb01 added url to bookmark icon link
url is anyway needed, saves index lookup and works w/o commited url.
Removed unused order parameter
2015-01-05 06:55:53 +01:00
reger
4ff018c9e4 fix ConfigPortal jumps to iframe focus
add focus parameter to yacysearch.html too
2015-01-04 06:57:13 +01:00
reger
0dfeee154a adjustments for Bookmark icon to act on BookmarkDB,
it acts on YMarks but YMark interface seems not maintained,
for future features (e.g. query memory) BookmarkDB is the likely choice to expand, besides the crawlstart bookmark also the result bookmark icon now adds to BookmarkDB.
The YMark related code is (for now) left untouched so both tables are updated.
2015-01-01 02:41:20 +01:00