Commit Graph

8802 Commits

Author SHA1 Message Date
Michael Peter Christen
9ef4503672 fixed some newInstance() warnings
.. by adding .getDeclaredConstructor()
2021-08-07 18:46:53 +02:00
Michael Peter Christen
1d41380f0a better support for mac-specific tray functions in java 9 2021-07-12 17:27:59 +02:00
Michael Peter Christen
e81b770f79 enabled crawl starts with very large sets of start urls
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
c623a3252e fix for jdk 14 bug 2021-04-23 09:11:03 +02:00
Michael Peter Christen
dbd211a1ad removed/replaced reflection in memory tool 2021-04-22 20:24:13 +02:00
Michael Peter Christen
1cdb21592b added hazelcast and some modifications to align legacy YaCy with
YaCyGrid
2021-04-15 20:39:22 +02:00
Michael Christen
42ea2a1c6f
Merge pull request #405 from jfhs/jfhs/support-all-html-entities
Improve HTML entities support
2021-03-31 01:44:54 +02:00
Michael Christen
b2af745dd6
Merge pull request #404 from lnceballosz/master
NGI0 - Updating licensing aspects according REUSE
2021-03-30 23:48:21 +02:00
jfhs
10bddc2c2d Decode HTML entities in all property values by default 2021-03-30 22:24:55 +02:00
jfhs
2135d259e3 Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities 2021-03-30 22:24:54 +02:00
Michael Peter Christen
8f876a8c72 added concurrency to enhance indexing speed during json surrogate import 2021-03-30 12:07:36 +02:00
Michael Peter Christen
f8cbaeef93 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2021-03-29 18:46:53 +02:00
Michael Peter Christen
a857e3d3d5 fix for json importer 2021-03-29 18:46:42 +02:00
sgaebel
1546232c94 adds ranking for multi document queries only 2021-03-20 17:48:35 +01:00
sgaebel
93b353d22d does not boost or add fields for zero-row-queries (exists()) 2021-03-20 17:48:26 +01:00
sgaebel
f16cd154f7 removes unused imports and variables 2021-03-20 15:14:09 +01:00
sgaebel
c69c462a15 replaces a expensive getLoadTimeURL() by exists()
refactors urlExists to getHarvestProcess as that is what it does
2021-03-20 15:01:31 +01:00
sgaebel
a5488ac8f5 uses edismax queries on query counts > 1 only 2021-03-20 01:06:09 +01:00
sgaebel
26223dc25a replaces getLoadTime() by exists() with a simpler query
since solr-8.8.1 getLoadTime() causes a high cpu usage
2021-03-20 01:06:02 +01:00
sgaebel
8e4d014c06 removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the
log
2021-03-18 22:33:39 +01:00
Lina Ceballos
a96752f5ab adding SPDX license and copyright headers 2021-03-11 12:17:11 +01:00
Michael Peter Christen
e18d0ef544 trying to set a higher priority to the process that is involved in index
export
2021-03-09 00:04:05 +01:00
Michael Peter Christen
8b4394a6c5 fixes for solr 8.8.1 migration
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
2021-03-08 13:39:27 +01:00
Michael Peter Christen
ed9789214e fixed seed initialization problem 2021-03-06 13:35:46 +01:00
Al Sutton
8ade8b8775 Remove forced clear to match new behaviour in 2da71c2a40 2021-03-04 16:37:56 +00:00
Al Sutton
09695fc6d3 Update exceptions to match updated API 2021-03-04 16:34:02 +00:00
Al Sutton
69014a701e Update API Usage 2021-03-04 16:14:56 +00:00
Michael Peter Christen
3da7628117 use environment variables to overwrite configuration variables
you can i.e. do:
export YACY_PORT=8092 && ./startYACY.sh
Just append "YACY_" to uppercase version of environment variables and
replace all "." with "_".
2021-02-09 20:26:49 +01:00
Michael Peter Christen
13a2e6dc6e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2021-01-25 11:49:32 +01:00
Michael Peter Christen
0ae8ccf657 Make it possible to set an empty password disabling the authentication
protocol completely
If you set now an empty password, then the http server will not ask to
authentify. This is required for environment where we attach an outside
authentification service like keycloak or similar using authentication
in an ingress proxy.
This change is part of the approach to run YaCy inside of a kubernetes
cluster where we do not want individual authentication of peers and want
to apply a ingress authentication.
2021-01-25 11:49:21 +01:00
Michael Peter Christen
96592a10cf added option to set yacy configuration values using environment
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
2021-01-24 22:50:37 +01:00
Michael Peter Christen
198826c362 added network scanner process to discover all YaCy peers in the intranet
this will be used to wire YaCy peers in a kubernetes cluster
2021-01-23 15:14:49 +01:00
Michael Peter Christen
d9602e8325 Implemented a new syntax in the template engine to simplify json APIs
Added also an example for one of the existing APIs. The problem is the
comma separator between objects which must not be there for the last
entry in a sequence. The new syntax adds the separator symbol
automatically.
2021-01-18 00:01:08 +01:00
Michael Peter Christen
5a7f12a9c1 allow network scans for non-standard http/https ports 2021-01-11 00:28:24 +01:00
sgaebel
b8d264f7ec fixes logging 2021-01-04 20:53:40 +01:00
Michael Peter Christen
4c920d05b5 removed superfluous lines 2020-12-29 20:19:58 +01:00
Michael Peter Christen
907f121d0c do not overwrite PW with random PW 2020-12-29 20:18:25 +01:00
Michael Peter Christen
3e6a1e0a49 fixed surrogate process counter 2020-12-28 18:26:22 +01:00
Michael Peter Christen
d3526c52af fixed a problem in warc importer: do not fail if single WARC entries are
faulty
2020-12-28 17:05:06 +01:00
Michael Peter Christen
3078b74e1d Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2020-12-22 00:46:56 +01:00
Michael Peter Christen
01cc32217f fixed apicall call method parameters
and verification in transaction manager
which did not have and exception for localhost/basic authentication
2020-12-22 00:46:47 +01:00
Michael Peter Christen
63f58e4785 enhanced strategy in host browser
limit number of fresh hosts in round robin hashes
2020-12-20 23:15:55 +01:00
Michael Peter Christen
9be36800a4 increased redirect depth by one
this makes sense if one redirect replaces http with https and another
replaces www subdomain by without (and vice versa)
2020-12-20 19:44:16 +01:00
Michael Peter Christen
d0abb0cedb enabling all crawl profiles in all network modes
also: increased default internet crawl speed to
4 urls/s/host
2020-12-19 01:00:51 +01:00
Michael Peter Christen
baad56d83d beautified default peer names 2020-12-14 02:08:49 +01:00
Michael Peter Christen
43a9f4f574 updated solr 6.6.6 -> 7.7.3
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
2020-12-12 02:06:43 +01:00
Michael Peter Christen
c0d9a3e9a7 turned HostBrowser into a admin-only page, now called IndexBrowser
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
2020-12-11 00:50:52 +01:00
Michael Peter Christen
d359d521a1 fixed warc importer
The importer tried to import a gziped files as plain warc.
It will now check the file extension and use a unzip automatically
on-the-fly.
2020-12-10 11:19:25 +01:00
Michael Peter Christen
e54ab39958 Going back to basic authentication for console/shell commands
This does not affect security because:
- it is going to localhost only
- only users who have already access to the pw hash can do this
- no clear text pw is transmitted because that is not stored anywhere
The switch to basic is required because these commands are required
in the context of hosting on root servers and docker containers
where a password change must be done. But the password shell command
was not working without password which made the concept unusable.
This deficit made it virtually impossible for root server operators
to use YaCy because they had been unable to set up a proper password.
2020-12-09 02:36:55 +01:00
Michael Peter Christen
6271e9122c javadoc fix 2020-12-09 02:22:47 +01:00