Commit Graph

14098 Commits

Author SHA1 Message Date
Michael Peter Christen
e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
Michael Peter Christen
e9c5e78868 replaced new Number(Number) with Number.instanceOf
to remove deprecation warnings for Java 9
2021-08-08 00:39:03 +02:00
Michael Peter Christen
9e13d77de4 removed call to class.finalize() because of deprecation in java 9
next: removal of finalize() implementation
after testing with assert false
2021-08-07 18:57:49 +02:00
Michael Peter Christen
9ef4503672 fixed some newInstance() warnings
.. by adding .getDeclaredConstructor()
2021-08-07 18:46:53 +02:00
Michael Peter Christen
82df012442 removed old lib 2021-08-07 18:23:22 +02:00
Michael Peter Christen
8a2adb2b15 upgraded commons-compress lib
cause: alert in
https://github.com/yacy/yacy_search_server/security/dependabot/pom.xml/org.apache.commons:commons-compress/open
2021-08-07 18:21:54 +02:00
Michael Peter Christen
9182b3dfca enhanced default value 2021-08-05 09:18:05 +02:00
Michael Peter Christen
294d56d4a2 addressing better GC behavior after removing Xms with earlier heap increase strategy 2021-08-05 09:16:52 +02:00
Michael Peter Christen
3959d43a5c fixed doku link 2021-08-03 16:57:24 +02:00
Michael Peter Christen
c4659f0fb0 removed Debian and Red Hat build process
as announced in
https://twitter.com/yacy_search/status/1414608643241152516
because of lack of community support for these kind of
distributions. We will still support
tarball, Windows, Mac and Docker releases.
2021-07-19 20:33:52 +02:00
Michael Peter Christen
73360ed52b add gradle to gitignore 2021-07-19 20:12:03 +02:00
Michael Peter Christen
15b7461bc7 removed Xms java memory startup parameter
We will use the default value for now on.
This is much better for resource economy and fits better into a
container/docker/kubernetes strategy.
Furthermore, a small memory footprint is essential for the usage on
small devices like RaspberryPi.
2021-07-19 20:04:11 +02:00
admin
c3b3087077 gradle cleanup 2021-07-14 14:07:49 +02:00
admin
a13986d659 replaced maven with gradle 2021-07-14 13:58:30 +02:00
Michael Peter Christen
1d41380f0a better support for mac-specific tray functions in java 9 2021-07-12 17:27:59 +02:00
Michael Peter Christen
4377bd2b70 fix for wrong crawlName construction 2021-06-30 18:03:54 +02:00
Michael Peter Christen
e81b770f79 enabled crawl starts with very large sets of start urls
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
c623a3252e fix for jdk 14 bug 2021-04-23 09:11:03 +02:00
Michael Peter Christen
dbd211a1ad removed/replaced reflection in memory tool 2021-04-22 20:24:13 +02:00
Michael Peter Christen
160f00e59e removed reconfigure script which is seven years old any may not up to
standards of current password implementation.
See https://github.com/yacy/yacy_search_server/issues/409 as hint
2021-04-15 20:41:01 +02:00
Michael Peter Christen
1cdb21592b added hazelcast and some modifications to align legacy YaCy with
YaCyGrid
2021-04-15 20:39:22 +02:00
Michael Christen
42ea2a1c6f
Merge pull request #405 from jfhs/jfhs/support-all-html-entities
Improve HTML entities support
2021-03-31 01:44:54 +02:00
Michael Christen
b2af745dd6
Merge pull request #404 from lnceballosz/master
NGI0 - Updating licensing aspects according REUSE
2021-03-30 23:48:21 +02:00
jfhs
10bddc2c2d Decode HTML entities in all property values by default 2021-03-30 22:24:55 +02:00
jfhs
2135d259e3 Replace hardcoded html/xml entities with a file, support decoding all defined HTML entities 2021-03-30 22:24:54 +02:00
Michael Peter Christen
8f876a8c72 added concurrency to enhance indexing speed during json surrogate import 2021-03-30 12:07:36 +02:00
Michael Peter Christen
f8cbaeef93 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2021-03-29 18:46:53 +02:00
Michael Peter Christen
a857e3d3d5 fix for json importer 2021-03-29 18:46:42 +02:00
sgaebel
7fecd859e5 fixes showing metadata from Searchresult, by removing defType=edismax
also removes defType=edismax from IndexBrowser, but still does not show
dates
2021-03-21 00:06:26 +01:00
sgaebel
1546232c94 adds ranking for multi document queries only 2021-03-20 17:48:35 +01:00
sgaebel
93b353d22d does not boost or add fields for zero-row-queries (exists()) 2021-03-20 17:48:26 +01:00
sgaebel
f16cd154f7 removes unused imports and variables 2021-03-20 15:14:09 +01:00
sgaebel
c69c462a15 replaces a expensive getLoadTimeURL() by exists()
refactors urlExists to getHarvestProcess as that is what it does
2021-03-20 15:01:31 +01:00
sgaebel
a5488ac8f5 uses edismax queries on query counts > 1 only 2021-03-20 01:06:09 +01:00
sgaebel
26223dc25a replaces getLoadTime() by exists() with a simpler query
since solr-8.8.1 getLoadTime() causes a high cpu usage
2021-03-20 01:06:02 +01:00
sgaebel
8e4d014c06 removes useless SolrRequestInfo.clearRequestInfo(), avoids spamming the
log
2021-03-18 22:33:39 +01:00
sgaebel
88c6bc8cd7 adds missing solr lib: opentracing 0.33.0 2021-03-18 21:42:58 +01:00
Lina Ceballos
139b5a4033 improving license info in README 2021-03-11 12:23:53 +01:00
Lina Ceballos
a96752f5ab adding SPDX license and copyright headers 2021-03-11 12:17:11 +01:00
Lina Ceballos
221038f16d creating LICENSES directory 2021-03-11 12:16:37 +01:00
Michael Peter Christen
e18d0ef544 trying to set a higher priority to the process that is involved in index
export
2021-03-09 00:04:05 +01:00
Michael Peter Christen
c552a2845f added new commons library (missed in latest commit) 2021-03-08 13:39:48 +01:00
Michael Peter Christen
8b4394a6c5 fixes for solr 8.8.1 migration
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
2021-03-08 13:39:27 +01:00
Michael Peter Christen
3befaaf4f1 reformatting pom.xml to make it easier to update it with recent library versions 2021-03-08 00:41:41 +01:00
Michael Christen
dffe9e1c23
Merge pull request #402 from SebastianoPistore/junitUpdate
Workaround for CVE-2020-15250
2021-03-06 13:45:11 +01:00
Michael Peter Christen
7c86826db3 new version for solr 8
ATTENTION: old indexes from solr 6 CANNOT be migrated to solr 8
DO NOT use this version if you still have a solr 6 index.
2021-03-06 13:37:06 +01:00
Michael Peter Christen
ed9789214e fixed seed initialization problem 2021-03-06 13:35:46 +01:00
Michael Peter Christen
f4f3808d43 added missing new dependencies for migration to Solr 8
after pulling https://github.com/yacy/yacy_search_server/pull/403
2021-03-06 13:35:32 +01:00
Michael Christen
ffe8786d69
Merge pull request #403 from alsutton/address_security_issues
Update dependencies to address vulnerabilities.
2021-03-06 12:58:56 +01:00
Al Sutton
f4dd6e6d41 Update Lucene to 8.8.1 2021-03-04 17:44:01 +00:00