Commit Graph

692 Commits

Author SHA1 Message Date
reger24
417899dda2 Correction of last commit 9dace71aea
accidently commented out kaskelix
2022-02-10 05:55:47 +01:00
reger24
9dace71aea Update yacy.net release download location
to //download.yacy.net/ in network.*.unit files

@Orbiter   for the latest avail. releases (v1.924 ...tar.gz)  the *.tar.gz.sig  file is missing,
so download fails with error "Download of releas .... failed"
2022-02-10 05:51:11 +01:00
Michael Peter Christen
96e44e11bb added more bootstrap addresses 2022-01-28 13:26:51 +01:00
lifeofguenter
870319e769
Fix typo + remove dead seeds 2021-12-27 14:12:17 +01:00
sgaebel
1cdc55a425 lets SOLR merge bigger segments (up to 50GB)
+ some setting to reduce caches
2021-10-31 11:33:42 +01:00
Michael Peter Christen
49cae8ca62 network bootstraping addresses update 2021-10-25 18:32:57 +02:00
Michael Peter Christen
be0aebad84 fixes https://github.com/yacy/yacy_search_server/issues/424 2021-10-04 14:38:49 +02:00
Michael Peter Christen
8084960392 disabled citation index
that was created but never used
2021-09-15 18:46:37 +02:00
admin
9b7668fa58 reduced memory footprint during indexing/crawling 2021-08-24 12:24:52 +02:00
Michael Peter Christen
e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
Michael Peter Christen
15b7461bc7 removed Xms java memory startup parameter
We will use the default value for now on.
This is much better for resource economy and fits better into a
container/docker/kubernetes strategy.
Furthermore, a small memory footprint is essential for the usage on
small devices like RaspberryPi.
2021-07-19 20:04:11 +02:00
jfhs
2135d259e3 Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities 2021-03-30 22:24:54 +02:00
Michael Peter Christen
8b4394a6c5 fixes for solr 8.8.1 migration
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
2021-03-08 13:39:27 +01:00
Michael Peter Christen
96592a10cf added option to set yacy configuration values using environment
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
2021-01-24 22:50:37 +01:00
Michael Peter Christen
ca10f0afca fixed optional default PW 2020-12-29 20:19:07 +01:00
Michael Peter Christen
baad56d83d beautified default peer names 2020-12-14 02:08:49 +01:00
Michael Peter Christen
43a9f4f574 updated solr 6.6.6 -> 7.7.3
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
2020-12-12 02:06:43 +01:00
Michael Peter Christen
c0d9a3e9a7 turned HostBrowser into a admin-only page, now called IndexBrowser
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
2020-12-11 00:50:52 +01:00
Michael Peter Christen
39f87f7f28 added a hint to the default settings how to set a default password 2020-12-09 02:42:05 +01:00
parnikkapore
a251727b96
Typo fix 2020-01-20 20:11:03 +07:00
Michael Christen
cb20aa7e54 removed donation message in search result column 2019-10-17 01:35:44 +02:00
Michael Christen
ab467b1764 fixed css profile name 2019-10-14 01:53:09 +02:00
Michael Peter Christen
dddf5930fa more space for sponsoring 2019-09-29 00:26:48 +02:00
Michael Peter Christen
897582d23b updated seedlist bootstrap locations 2019-09-25 22:51:25 +02:00
luccioman
5a3d5cb92c Upgraded Solr config files with the ones provided by Solr release
Fixes #292
2019-04-16 10:25:48 +02:00
luccioman
a5771b1f14 Made SNI extension user configurable without the need for server restart
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.

Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
2019-04-14 15:41:13 +02:00
luccioman
a8316c79da Allow JS resorting of search results by unauthenticated users
Acces rate limitations to this search mode by unauthenticated users are
set low by default to prevent unwanted server overload but can be
customized through the SearchAccessRate_p.html configuration page

Fixes #291
2019-04-03 14:21:53 +02:00
luccioman
0ab2b49c31 Made /yacysearch access rate limitations user configurable
With a new admin page at /SearchAccessRate_p.html in menu Network Access
> Local Search > Access Rate Limitations
2019-04-02 17:42:50 +02:00
luccioman
36c4083f54 Removed no more available OpenSearch URL example 2019-02-02 00:42:37 +01:00
luccioman
0dc5cfe58c Updated federated search html results mapping example 2019-02-02 00:41:49 +01:00
luccioman
9782a98a9c Added the possibility to customize facets sort type and direction
Previously search navigators/facets elements were sorted only by counts.
Now from the ConfigSearchPage_p.html admin page, sort direction
(ascending/descending) and type (on counts or labels) can be customized
independently for each navigator.
2019-01-24 18:43:06 +01:00
sgaebel
8f58c1dcfa extend the SolrServlet to be usable as remote solr (incl. update)
this feature needs to be enabled by uncomment the url-pattern
2019-01-04 18:27:44 +01:00
luccioman
08ea0b0397 Added a configurable timeout to wkhtmltopdf calls for pdf snapshots
Necessary to prevent blocking the indexing workflow when some
wkhtmltopdf renderings fail without terminating
2018-12-11 22:31:31 +01:00
luccioman
4196101379 Enable soft autocommit in default Solr config
Since upgrade from Solr 5.5 to Solr 6.6 (commit 6fe7359), hard
autocommits were still enabled to regularly persist the Solr index to
the file system, but new index entries were no more automatically made
available for use by the application (soft autocommit).
Therefore, YaCy features such as index statistics, that do not perform
an explicit commit (as recommended by Solr documentation) were no more
accurate.
Soft autocommit is now restored as a default, with a time period
expected to be sufficient for accuracy while adding only a reasonable
system load overhead.

Fixes issue #251
2018-11-19 08:49:13 +01:00
luccioman
4129d712a7 Added details to the keystore configuration properties documentation 2018-11-13 07:50:27 +01:00
reger
6b7883900c update bootstrap hosts 2018-07-02 00:00:04 +02:00
luccioman
b5dc1f376f Made outgoing pools max total connections user configurable
For a finer control over the maximum simultaneously active outgoing
connections.
2018-06-06 09:36:50 +02:00
luccioman
387d646c0e Added gzip compression of responses returned to user-agents accepting it
Enabled as default, but can be disabled using the "Server Access
Settings" admin page.
2018-06-05 13:35:39 +02:00
luccioman
35826a3091 Added a search page customization setting to display or not favicons
If not interested in displaying this on your search results and notably
on a peer with limited resources this can help saving some CPU and
outgoing network connections.
2018-05-25 11:13:43 +02:00
luccioman
79bd9f623a Updated YaCy home page embedded links from http to https scheme 2018-05-22 17:46:12 +02:00
luccioman
a3ec7a7a5f Added analysis optional setting to compute statistics on text snippets
Thus producing some basic stats on processing times for snippets
generation and counts on snippets per source type.
2018-04-15 09:55:08 +02:00
luccioman
69690c13a0 Optionally allow external Solr server with self-signed certificate
This is necessary when you want to attach to a dedicated external Solr
server protected with basic http authentication and requested over https
but having only a self-signed certificate.
2018-04-04 18:16:26 +02:00
Marc Nause
1e4ceaac3f Removed seed URLs pointing to server low.audioattack.de since it will not be updated anymore. 2018-04-03 23:19:05 +02:00
luccioman
6784c9be68 Updated external Solr setup basic instructions 2018-04-03 15:34:44 +02:00
luccioman
c3ff50c17a Updated the list of audio file formats supported by the audioTagParser
Follows upgrade to Jaudiotagger dependency to version 2.2.5.
2018-02-27 18:04:12 +01:00
luccioman
9412881230 Added basic support for autotagging microdata annotated item types.
With the appropriate vocabulary settings in Vocabulary_p.html page, this
can produce Vocabulary search facets displaying item types referenced in
html documents by microdata annotation.
Tested notably, but not limited to, vocabulary classes/types defined by
Schema.org and Dublin Core.
2018-02-06 10:25:38 +01:00
luccioman
e6907fdab3 Added optional search parameter/setting to control content domain filter
Thus allowing to choose at configuration or per search request, whether
extending or not results beyond strict content domain filter (image,
video, audio or application).

Related graphical controls to be added to user interface.
2017-12-23 18:56:17 +01:00
luccioman
17e004599d Started implementing optional https preference for protocol operations
Introduced through the new configurable setting
network.unit.protocol.https.preferred, defaulting to false for now.

Let choose to prefer using https when available on remote peers to
perform YaCy protocol operations including notably hello or transferRWI.

Not yet implemented for every YaCy protocol operations.
2017-12-15 11:28:46 +01:00
luccioman
d95b288f19 Removed use of deprecated Jetty IPAccessHandler for client filtering.
Upgraded to InetAccessHandler.
Added InetPathAccessHandler extension to InetAccessHandler to maintain
path patterns capability previously available in IPAccessHandler but
lost in InetAccessHandler.

Filtering on IPv6 addresses is now supported.

Support for deprecated pattern formats such as "192.168." and
"192.168.1.1/path" has been removed, but startup automated migration
should convert such patterns eventually present in serverClient.
2017-12-08 15:12:08 +01:00
luccioman
f01aac31fd Made possible to use https for remote search on peers with SSL enabled.
Default is still http to prevent any regressions, but a new setting is
available to choose https as the preferred protocol to perform remote
searches. 
New configuration setting 'remotesearch.https.preferred' is manually
editable in yacy.conf file or in Advanced Properties page
(/ConfigProperties_p.html).
Should be enabled as default in the future for improved privacy. 
Https could also eventually be used for other peers communications.
2017-11-24 14:10:41 +01:00