Commit Graph

708 Commits

Author SHA1 Message Date
luccioman
4129d712a7 Added details to the keystore configuration properties documentation 2018-11-13 07:50:27 +01:00
reger
6b7883900c update bootstrap hosts 2018-07-02 00:00:04 +02:00
luccioman
b5dc1f376f Made outgoing pools max total connections user configurable
For a finer control over the maximum simultaneously active outgoing
connections.
2018-06-06 09:36:50 +02:00
luccioman
387d646c0e Added gzip compression of responses returned to user-agents accepting it
Enabled as default, but can be disabled using the "Server Access
Settings" admin page.
2018-06-05 13:35:39 +02:00
luccioman
35826a3091 Added a search page customization setting to display or not favicons
If not interested in displaying this on your search results and notably
on a peer with limited resources this can help saving some CPU and
outgoing network connections.
2018-05-25 11:13:43 +02:00
luccioman
79bd9f623a Updated YaCy home page embedded links from http to https scheme 2018-05-22 17:46:12 +02:00
luccioman
a3ec7a7a5f Added analysis optional setting to compute statistics on text snippets
Thus producing some basic stats on processing times for snippets
generation and counts on snippets per source type.
2018-04-15 09:55:08 +02:00
luccioman
69690c13a0 Optionally allow external Solr server with self-signed certificate
This is necessary when you want to attach to a dedicated external Solr
server protected with basic http authentication and requested over https
but having only a self-signed certificate.
2018-04-04 18:16:26 +02:00
Marc Nause
1e4ceaac3f Removed seed URLs pointing to server low.audioattack.de since it will not be updated anymore. 2018-04-03 23:19:05 +02:00
luccioman
6784c9be68 Updated external Solr setup basic instructions 2018-04-03 15:34:44 +02:00
luccioman
c3ff50c17a Updated the list of audio file formats supported by the audioTagParser
Follows upgrade to Jaudiotagger dependency to version 2.2.5.
2018-02-27 18:04:12 +01:00
luccioman
9412881230 Added basic support for autotagging microdata annotated item types.
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
2018-02-06 10:25:38 +01:00
luccioman
e6907fdab3 Added optional search parameter/setting to control content domain filter
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
2017-12-23 18:56:17 +01:00
luccioman
17e004599d Started implementing optional https preference for protocol operations
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.
2017-12-15 11:28:46 +01:00
luccioman
d95b288f19 Removed use of deprecated Jetty IPAccessHandler for client filtering.
Upgraded to InetAccessHandler.
Added InetPathAccessHandler extension to InetAccessHandler to maintain
path patterns capability previously available in IPAccessHandler but
lost in InetAccessHandler.

Filtering on IPv6 addresses is now supported.

Support for deprecated pattern formats such as "192.168." and
"192.168.1.1/path" has been removed, but startup automated migration
should convert such patterns eventually present in serverClient.
2017-12-08 15:12:08 +01:00
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00
luccioman
bab5f0485f Added signing key to developer releases location. 2017-11-17 11:09:55 +01:00
luccioman
af198b990b Added an optional login link/status to the search public top nav bar.
Thus allowing a more convenient way (wihout the need to go to the admin
section) to login when searching on your remote or password protected
peer and benefit from extended search features such as Heuristics,
Bookmarking or JavasScript resorting.

Can be disabled using the ConfigSearchPage_p.html.
2017-10-21 10:57:36 +02:00
luccioman
dbff7b14fc Add a configurable limit to tags initially displayed in search results
When the limit is reached, a button allow expanding/collapsing remaining
tags.

When this feature is activated without a limit to the number of
displayed tags, when encountering search results with a very large
number of keywords, the results page can become almost unusable (very
long vertical scrollbar)
2017-10-09 14:13:46 +02:00
luccioman
ef8aea7f8d Made the dates navigator max elements number user configurable.
Also used object properties on QueryParams instances, rather than using
mutable class (static) properties.
2017-09-25 09:19:08 +02:00
JeremyRand
d37df75afa
(WIP) Optionally sort HTML search items via Javascript.
TODO: Expose a GUI setting for this.
2017-09-03 17:50:08 +00:00
reger
b6a41df4f7 Remove deprecated YaCyProxyServlet
was replaced by UrlProxyServlet
2017-08-12 21:53:04 +02:00
reger
41616de0b8 Add SolrConfig ClassicIndexSchemaFactory to prevent Solr startup warning.
This overrides Solr default to use managed schema. As we don't use
programatic schema changes this directs Solr to use schema.xml, eliminating
the warning.
2017-07-23 03:55:56 +02:00
reger
9220ccbec7 remove reference to velocityresponsewriter in solrconfig.xml
it is not longer part of solr-core api
http://lucene.apache.org/solr/6_6_0/index.html
2017-06-16 00:12:09 +02:00
reger
4be4bfbba6 remove sample path setting in solrconfig.xml not valid in Yacy
resulting in startup stop exception after fresh swithch to 1.921
2017-06-15 21:02:18 +02:00
luccioman
f6e8d71718 Prevent high CPU load at startup, caused by the Solr suggester build.
Reported by Collision on mantis 758 (
http://mantis.tokeek.de/view.php?id=758 ).
Introduced by the new YaCy Solr configuration for Solr 6.6.0 (see commit
6fe735945d), including now Suggester
configuration.
2017-06-15 14:13:46 +02:00
luccioman
28b451a0b3 Made Cache compression level and lock timeout user configurable 2017-06-14 19:02:08 +02:00
luccioman
73ab4a7b3a Prevent log pollution from unwanted Solr warnings.
Many non-blocking "java.nio.file.NoSuchFileException" traces with
warning log level can be logged by Solr, especially when heavily
crawling. This is issue is known from Solr 5.x but still unresolved with
Solr 6.x ( https://issues.apache.org/jira/browse/SOLR-9120 )

Consequently upgraded to "SEVERE" the default log level of the related
internal Solr class.

See also mantis 727 ( http://mantis.tokeek.de/view.php?id=727 )
2017-06-14 08:56:11 +02:00
Michael Peter Christen
6fe735945d migrated Solr 5.5 -> Solr 6.6 and from Java 1.7 -> 1.8
Also: now Version 1.921
2017-06-09 12:25:23 +02:00
reger
a814f3d885 Introduce keyword query parameter
This enables keyword navigator to filter on keywords. Added search page
output and layout config for keywords, allowing e.g. in Intranet use
to display the keywords. No styling or links applied to the keyword
text (but is desirable possibly in combination with bootstrap-tagsinput
for future/intranet).
2017-06-02 01:00:21 +02:00
luccioman
d90b001e1b Improved previous merge "Show ranking in HTML UI".
- added the new setting as configurable in the "Debug/Analysis" settings
page. Debug/analysis is its main purpose for now as there is currently
no nice and "understansable" ranking score info servlet (see forum
discussion http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5884 ) 
- render in the "Search Page Layout" page preview when enabled
- added constants
2017-05-11 18:02:33 +02:00
luccioman
efe1232d90 Merge branch 'html-show-ranking' of
https://github.com/JeremyRand/yacy_search_server

Conflicts:
	defaults/yacy.init
2017-05-11 14:53:57 +02:00
luccioman
09e72eb0a4 Set Config Portal as a private administration page.
Consistently with its required action from submission credentials, and
because external unauthenticated users do not need to access these
settings.
2017-04-03 11:34:49 +02:00
reger
1ccc44e681 fix default/httpd.mime Z file extension to lower case
+ test case
2017-03-26 23:52:31 +02:00
reger
44a9a580e3 remove seedlist bootstrap target (not working for some longer time) 2017-03-26 23:26:40 +02:00
reger
3dd23c178b Introduce the option to configure a shutdown port.
A port value of -1 will disable this option.

If set to a value greater 0, YaCy listens on this of on the local loopback 
address (127.0.0.1) for a shutdown or restart signal.
E.g. connect to http://localhost:8005/shutdown will stop the YaCy server.
http://localhost:8005/restart will restart it.
This option allows to stop YaCy locally independant from the web web 
frontend (which might be configured for password protected remote access).
2017-03-19 02:30:08 +01:00
reger
f7fce1baad make digest default authentication in defaults/web.xml 2017-03-15 01:39:15 +01:00
luccioman
9d9f86dcdd Updated Archive-It heuristics URL.
The archive-it OpenSearch URL requested without restriction on
collections ("i" parameter) almost always ends up with timeout or fails.
2017-03-01 09:43:00 +01:00
luccioman
cdcd923375 Privacy enhancement : added settings to control referrer policy.
HTTP "Referer" header sent by the browser when using YaCy can now be
controlled either with the referrer meta tag as a global policy, or only
for search result links by adding the attribute rel="noreferrer".

To improve privacy with the less possible regressions, the default is
set as meta tag with value "origin-when-cross-origin" : internal YaCy
links behavior is not affected, but when visiting external websites
referrer url is not empty but stripped from query parameters and path.

Older browsers, Safari, MS IE and Edge do not support the referrer meta
tag, so the standard but less flexible noreferrer link type can also be
enabled as an alternative.

User-friendly settings page to be implemented.
2017-02-28 18:11:54 +01:00
luccioman
13c5c09518 Fixed datacite.org heuristics base url.
The datacite Solr search http URL was returning http status 301 in order
to redirect to its https version, thus making that YaCy heuristic always
fail.
2017-02-26 11:03:15 +01:00
luccioman
ac766327d3 Switched a few more Solr fields from strictly mandatory to optional 2017-02-24 11:08:18 +01:00
luccioman
cdc7f3e431 Switched some Solr fields from mandatory to optional
These fields are default enabled but with no doubt not strictly
mandatory with the current code base.

As reported by @reger24, splitting between essential mandatory and
optional fields is still to be improved to reflect the current YaCy
needs.
2017-02-21 22:59:11 +01:00
luccioman
c68a8be2d9 Refactored and enforced Solr mandatory fields for proper operation
- Added a new method to check activation of mandatory fields on
Collection Configuration commit, consistently with checks previously
performed in Switchboard startup and with mandatory fields in the
default schema.
- Reorganized default schema and CollectionConfiguration enumeration :
moved no more mandatory fields in a specific section, and moved fields
enabled at startup to the mandatory section. 
- Marked mandatory fields as required and with stronger font in the
IndexSchema_p.html page
2017-02-20 10:48:07 +01:00
reger
6ec6ab55ba removed faroo news from default opensearch config
As @luccioman informed, it's only useable with a free api key
http://www.faroo.com/hp/api/api.html
http://blog.faroo.com/2013/06/30/faroo-introduces-an-api-key/
2017-02-15 23:26:54 +01:00
reger
f85aaa7c76 update opensearch conf - remove suche.sueddeutsche.de
apparently they've revoked the participation in opensearch initiative.
2017-02-14 00:31:32 +01:00
luccioman
bf16de29c1 Added support for HTML OpenSearch results.
Many OpenSearch systems do not provide results as standard RSS/Atom
feeds but only as HTML. 

This modification add some support for custom OpenSearch HTML results
through the use of mapping files (as already done for federated Solr
search) relying on CSS-like selectors to retrieve information from HTML
content.

An example mapping file is provided to map results from the
www.npmjs.com OpenSearch URL.
2017-02-13 19:11:17 +01:00
luccioman
1857651988 Added a new Debug/Analysis advanced settings subsection.
As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
settings page includes:
 - a new setting to control remote Solr responses encoding
 - some existing debug settings which could not be set through the admin
user interface
2017-02-09 11:05:06 +01:00
luccioman
826e5bbadd Documented /HostBrowser.html related configuration settings 2017-01-23 16:05:51 +01:00
Michael Peter Christen
dbd34befc0 added luccioman development release builds as discussed in
http://forum.yacy-websuche.de/viewtopic.php?f=8&t=5906
2016-12-26 13:50:35 +01:00
Michael Peter Christen
204e507b2b updated seed-list bootstrap locations 2016-12-26 13:32:36 +01:00
luccioman
aa9ddf3c23 Added control over Robots.txt active threads maximum number.
When starting a crawl from a file containing thousands of links,
configuration setting "crawler.MaxActiveThreads" is effective to prevent
saturating the system with too many outgoing HTTP connections threads
launched by the crawler.
But robots.txt was not affected by this setting and was indefinitely
increasing the number of concurrently loading threads until most ot the
connections timed out.

To improve performance control, added a pool of threads for Robots.txt,
consistently used in its ensureExist() and massCrawlCheck() methods.
The Robots.txt threads pool max size can now be configured in the
/PerformanceQueus_p.html page, or with the new
"robots.txt.MaxActiveThreads" setting, initialized with the same default
value as the crawler.
2016-11-23 18:13:05 +01:00
reger
08a0acc35d make a YearNavigator availabel, useable as SearchEvent.naviator plugin.
It can take any Date field of the index and displays a list of year strings
in reverse order by the year (not the score/count).
To allow to define the index field to use, the fieldname (and title can be 
appended to the navi's name "year" e.g. year:load_date_dt:LoadDate
It works also with dates_in_content_dts field (from the graphical date
navigator). Here the query parameter from: to: are used on selection as
Query modifier (for other dates currently no query parameter available, so
selection won't work to filter search results).
Not included in the UI Searchpage layout config so far (for experiment with
it manual change to conf needed).
2016-11-21 16:52:53 +01:00
reger
bad8f87998 remove old/obsolete clear text "adminAccount" credential entry from init
and setConfig (.,empty) from servlets/code
2016-11-20 00:20:47 +01:00
luccioman
7296e3884f Switched even more URLs to pure relative ones.
Thus a YaCy peer can run behind a reverse proxy subfolder without need
for the reverse proxy to rewrite HTML links (a CPU costly operation).

Tested on Debian Jessie with an apache2 reverse proxy.

See related mantis issues http://mantis.tokeek.de/view.php?id=106 and
http://mantis.tokeek.de/view.php?id=701
2016-11-09 02:40:33 +01:00
luccioman
84b81c1af0 Switched more URLs to relative ones when possible.
This permits an easier and more flexible reverse proxy configuration.
Some related mantis issues : http://mantis.tokeek.de/view.php?id=106 and
http://mantis.tokeek.de/view.php?id=701
2016-11-08 03:05:51 +01:00
reger
af39a76bf6 Reduce number of default max. search navigator lines (from 10000)
to 100 + make it configurable
2016-10-29 04:19:46 +02:00
luccioman
6e1959f469 Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
Conflicts:
	htroot/yacysearchitem.java
	source/net/yacy/cora/federate/solr/responsewriter/YJsonResponseWriter.java
	source/net/yacy/search/schema/CollectionConfiguration.java
	source/net/yacy/server/serverObjects.java
2016-10-14 11:29:55 +02:00
JeremyRand
4963ecb0a0
Add preference (disabled by default) to show the ranking for each result on the HTML UI. 2016-10-04 11:49:16 +00:00
luccioman
b3b75b0498 Accessibility : add a customizable alternative text to YaCy log
Applied W3C recommendations :
https://www.w3.org/TR/html51/semantics-embedded-content.html#a-link-or-button-containing-nothing-but-an-image
and
https://www.w3.org/TR/html51/semantics-embedded-content.html#logos-insignia-flags-or-emblems
2016-09-22 16:08:33 +02:00
reger
35a7d57260 update lucenematchversion to current (5.2.0 -> 5.5.0)
there should be no need for reindex by the update
2016-07-23 18:36:43 +02:00
Marc Nause
1f7013a1e3 removed unused properties in default config (CGI capabilities of YaCy's
HTTPd have been removed many moons ago)
2016-07-21 21:36:00 +02:00
luccioman
893a40995a Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2016-07-04 21:24:40 +02:00
Michael Peter Christen
634e48309b another peer list update 2016-07-04 11:02:36 +02:00
Michael Peter Christen
16420e5507 added another principal peer 2016-07-03 22:50:50 +02:00
luccioman
6e96c7341a Merge remote-tracking branch 'origin/master'
Conflicts:
	htroot/Load_MediawikiWiki.java
	htroot/Load_PHPBB3.java
	htroot/ViewImage.java
2016-07-03 18:59:00 +02:00
JeremyRand
433217b33e Properly support multiple Boost Queries. (Previous code was broken because it concatenated multiple Boost Queries together rather than passing Solr an array.) 2016-05-20 20:17:51 -05:00
reger
ef24593347 delete obsolete SEARCHRESULT busythread constants
not used since 29.05.2013 18:27:27
0c1a018bbd
2016-05-04 01:30:10 +02:00
reger
8410536f75 keep svnRevision in .init for convert of .conf until release >1.83 2016-03-20 18:12:55 +01:00
reger
726ebee65a include Version config string in yacy.init (replacing svnRevision) 2016-03-20 03:42:33 +01:00
Michael Peter Christen
f4591b1b51 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2016-03-11 18:12:38 +01:00
Michael Peter Christen
1ce38fdaed 0n - added experimental zeronet network which supports intranet peers
(still needs work)
2016-03-11 08:55:51 +01:00
Michael Peter Christen
d05ffa1c51 update to seed list 2016-03-11 07:20:38 +01:00
reger
16724c1283 remove unused proxyCookieWhiteList from yacy.init 2016-03-11 01:14:54 +01:00
luc
3cc5619d93 Improved HTML icons indexing and rendering in search results.
See http://mantis.tokeek.de/view.php?id=629
2016-02-02 09:57:54 +01:00
Michael Peter Christen
5d635879f8 Merge pull request #40 from Scarfmonster/autocrawl
Automatic crawling
2016-01-14 22:19:55 +01:00
Ryszard Goń
7d6e0d8470 Add missing settings to autocrawl settings page 2016-01-14 03:27:33 +01:00
Ryszard Goń
a98c395023 Add the Autocrawl thread 2016-01-14 00:50:23 +01:00
reger
4765e374e6 altered clac. of search result items per page to display
taking the existing limits into account but make it consistent with search option screen for admin and public user
changes:
  - configured default number of items per page (ConfigPortal_p.html) is used as is (no hardcoded limit)
  - otherwise requests are limited to 100 results per page ( = search option, index.html)
      (this basically is the major change, inc. limit from 20 to 100 for public user)
P.S. - the older grant of more (1000), if no online snippet calculation, is kept (for the time being)

see http://mantis.tokeek.de/view.php?id=627
2016-01-13 01:30:49 +01:00
Ryszard Goń
1728cd30c6 Create autocrawl profiles 2016-01-12 16:28:34 +01:00
reger
e8256bb3b1 remove blekko from opensearch config (not available)
see https://blekko.com/
http://searchengineland.com/goodbye-blekko-search-engine-joins-ibms-watson-team-217633
2016-01-04 04:49:10 +01:00
reger
a5faf73afa remove obsolete yacy.init entries interaction.*
(related to removed triplestore)
2015-12-29 15:41:19 +01:00
sixcooler
dce1cb65c4 Merge remote-tracking branch 'choose_remote_name/master' 2015-12-28 23:20:42 +01:00
reger
e84d94f8ca fix mime table for ms office / open office documents
(causing wrong parser detect in intranet mode)
2015-12-22 17:48:24 +01:00
reger
15e46b2bad exclude in/outboundlinksnofollowcount_i from default schema fields
(not used in any function)
2015-12-19 21:25:08 +01:00
luc
8c4ab9c76b Added an option to eventually limit size of remote solr documents put to
local index. See mantis #626.
2015-12-16 02:20:03 +01:00
luc
55a4d15775 Added a note on deprecated default search field and operator. 2015-12-14 23:55:12 +01:00
reger
b2c8bc0ae6 remove md5_s from default index fields
it is not assigned a value / not used
Due to above also excluded from transfer protocol.
2015-11-27 02:41:02 +01:00
sixcooler
f5a9948860 do not store subfield *_coordinate 2015-11-10 20:32:42 +01:00
sixcooler
fca353e5eb set startuptype of most solr handlers to lazy 2015-11-10 20:32:05 +01:00
reger
c720b4c249 remove override of dynamicField coordinate_p in solr schema
(coordinate_p is not a mandatory field as such doesn't need to be declared as schema.field)
2015-10-24 22:44:28 +02:00
reger
f0b5bc93a3 remove obsolete yacy.init entry "secureHttps"
not used anywhere
2015-10-19 03:47:28 +02:00
reger
5e45f1a460 enable Solr schema dynamicField _p (type=location) for YaCy coordinate_p field 2015-09-01 21:47:25 +02:00
sixcooler
87e4abe393 fight the fieldcache by usind DocValues: in Solr-5.x the fieldcache has
moved and was not cleared anymore. This results in an huge fieldcache.
(http://lucene.apache.org/#highlights-of-the-lucene-release-include
https://issues.apache.org/jira/browse/LUCENE-5666)
Here I try to use DovValues where it is possible.
For this I used the Api-Scheme as new basis für the Solr-Schema.
This needs at least a complete optimization of the Solr-Index to get a
smaller FieldCache.
Everything that is indexed with these setting will not use the
Fieldcache at all.
2015-08-31 20:24:41 +02:00
reger
250f6457f0 remove exired domain titan.deep-one.in from bootstrap.seedlist 2015-08-26 23:58:08 +02:00
Michael Peter Christen
df3314ac1a added a new facet type based on a probabilistic classifier using
bayesian filters. This can be used to classify documents during
indexing-time using a pre-definied bayesian filter.

New wordings:
- a context is a class where different categories are possible. The
context name is equal to a facet name.
- a category is a facet type within a facet navigation. Each context
must have several categories, at least one custom name (things you want
to discover) and one with the exact name "negative".

To use this, you must do:
- for each context, you must create a directory within
DATA/CLASSIFICATION with the name of the context (the facet name)
- within each context directory, you must create text files with one
document each per line for every categroy. One of these categories MUST
have the name 'negative.txt'.

Then, each new document is classified to match within one of the given
categories for each context.
2015-08-10 14:27:44 +02:00
Michael Peter Christen
e1cd9c0dba added another default network / commented out 2015-07-09 16:25:11 +02:00
reger
00d2062813 Rem depreciated AdminHandlers in solrconfig.xml
avoid warning log
W  org.apache.solr.handler.admin.AdminHandlers <requestHandler name="/admin/"  class="solr.admin.AdminHandlers" /> is deprecated . It is not required anymore
2015-07-01 00:58:23 +02:00
Michael Peter Christen
694b22f165 migration to Solr 5.2: huge benefits - this is a lot faster!
This is a very complex migration: many classes had been renamed or
removed, dependencies changed and the solr index type is now aligned to
be a solr cloud repository.
Together with the Solr 5.2 library update, one other dependent library
had been updated as well: httpclient 4.4->4.4.1

Older indexes are migrated from 4_10 to 5_2. However, the new index
structure is more efficient and we recommend to re-index everything.
Please use the index export before you do the update to a large
surrogate xml file. After the update, start with an empty index and then
initialize this with your dump.
2015-06-24 01:55:51 +02:00
Michael Peter Christen
9c12555be5 added link to Snapshots in search results if the snapshot exists and
option is set in ConfigSearchPage_p
(this is a stub: we also need a visualization of pdf files!)
2015-06-07 20:37:37 +02:00
reger
6bc8a9b11e make Quality of Service Servlet available to prioritize requests from local host
This assigns priorities to incoming requests. Higher priority numbers are served before lower.
(disabled by default in defaults/web.xml, 
uncomment or copy entry to DATA/Settings/web.xml)
2015-04-26 04:29:32 +02:00