Commit Graph

14187 Commits

Author SHA1 Message Date
Michael Peter Christen
9c38b1254e proper deletion of loadtime index 2021-12-22 01:22:46 +01:00
Michael Peter Christen
bd3f2483a1 replaced url and date retrieval by only url retrieval
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90 replaced check for load time method
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
Michael Peter Christen
1ead7b85b5 remove compiler warning
"warning: [try] explicit call to close() on an auto-closeable resource"
2021-12-13 12:28:34 +01:00
Michael Peter Christen
3dc6613096 updating slf4j 1.7.25 -> 1.7.32 2021-12-13 12:26:49 +01:00
Michael Christen
cd0ff48e99
there is no (more) log4j in YaCy 2021-12-12 13:53:19 +01:00
Michael Peter Christen
59777010dc Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2021-11-18 00:49:56 +01:00
Michael Peter Christen
7898815c41 disabling concurrent logging
(maybe temporary)
2021-11-18 00:49:46 +01:00
sgaebel
4bf6954474 uses clientBuilder not HttpClients.custom() to have these inside the
Pool too
2021-10-31 23:06:33 +01:00
sgaebel
cdf901270c always use HTTPClient by 'try with resources' pattern to free up
resources
2021-10-31 23:06:23 +01:00
sgaebel
69adaa9f55 makes our HTTPClient closable 2021-10-31 23:06:02 +01:00
sgaebel
fc4275f901 handle all references for client, response, request to be able to close
them
2021-10-31 23:05:50 +01:00
sgaebel
1cdc55a425 lets SOLR merge bigger segments (up to 50GB)
+ some setting to reduce caches
2021-10-31 11:33:42 +01:00
sgaebel
e7d3a363f2 refactor to use finish() 2021-10-31 11:22:35 +01:00
sgaebel
4fc876f4a3 revert back to use EntityUtils.consumeQuietly - as it simply closes the
underlying stream
2021-10-31 11:22:28 +01:00
sgaebel
4f0392e93e refactor use of AuthSchemeProvider 2021-10-31 11:21:59 +01:00
sgaebel
b74f337859 removes double setting of UserAgent 2021-10-31 11:21:06 +01:00
sgaebel
965748fefb some refactoring using try with resources 2021-10-31 11:20:28 +01:00
Michael Christen
f4834e8e31
link fix 2021-10-29 11:10:23 +02:00
Michael Christen
7f5d3e3a12
fixed name 2021-10-29 11:07:34 +02:00
Michael Peter Christen
552ab7051b fix for warc importer 2021-10-25 19:35:15 +02:00
Michael Peter Christen
3c86b7b780 attempt to make a Mac Release using gradle
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh

The build is then inside build/mac/YaCy.app

Right now this works so far but it does not have the correct release
number inside.

Target is to make this working for Windows releases and to embedd jre
entirely.
2021-10-25 18:37:39 +02:00
Michael Peter Christen
49cae8ca62 network bootstraping addresses update 2021-10-25 18:32:57 +02:00
Michael Peter Christen
8e4383c49e downgrading gradle to 6.9
to be able to support org.mini2Dx.parcl
2021-10-25 18:32:34 +02:00
Michael Peter Christen
999c819e3e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2021-10-24 20:50:14 +02:00
Michael Peter Christen
fd770e90e2 spike to identify paths for YaCy within mac application bundles 2021-10-24 20:49:59 +02:00
Michael Peter Christen
d19872fd26 making sure that crawl queues are closed correctly to prevent data loss 2021-10-14 00:30:04 +02:00
sgaebel
90507c0fdc comments out printing query params to std.out 2021-10-04 18:03:06 +02:00
Michael Peter Christen
be0aebad84 fixes https://github.com/yacy/yacy_search_server/issues/424 2021-10-04 14:38:49 +02:00
Michael Peter Christen
63ad8ce6b2 removed ymarks
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
ef5a71a592 enhanced crawl start response time
for very very large crawl start lists
2021-09-16 21:01:01 +02:00
Michael Peter Christen
1bab4ffe20 calculating the correct size of an export.
This can be seen as a fix for
https://github.com/yacy/yacy_search_server/issues/343
however, the export was not flawed, it is just the impression that
something is wrong, but the export size must be smaller than the index
size because the index also containers error documents.
Now an information line is presented that shows i.e.:
"The local index currently contains 181,319 documents, only 106,887
exportable with status code 200 - the remaining are error documents."
2021-09-16 01:05:09 +02:00
Michael Peter Christen
4cadd557dc removed synchronization in table creation
to avoid possible deadlocks when handling OnDemandOpenFileIndex
which happens quite often during wide crawling
2021-09-15 19:34:49 +02:00
Michael Peter Christen
8084960392 disabled citation index
that was created but never used
2021-09-15 18:46:37 +02:00
admin
9b7668fa58 reduced memory footprint during indexing/crawling 2021-08-24 12:24:52 +02:00
admin
fbf8ddd32d upgrade of jsoup 1.12.1 -> 1.14.2 2021-08-24 12:23:57 +02:00
Michael Peter Christen
4c889b7ff9 fixed build paths 2021-08-18 19:05:44 +02:00
Michael Peter Christen
683cac125f updated bouncy castle 1.60 -> 1.69 2021-08-17 15:48:54 +02:00
Michael Peter Christen
e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
Michael Peter Christen
e9c5e78868 replaced new Number(Number) with Number.instanceOf
to remove deprecation warnings for Java 9
2021-08-08 00:39:03 +02:00
Michael Peter Christen
9e13d77de4 removed call to class.finalize() because of deprecation in java 9
next: removal of finalize() implementation
after testing with assert false
2021-08-07 18:57:49 +02:00
Michael Peter Christen
9ef4503672 fixed some newInstance() warnings
.. by adding .getDeclaredConstructor()
2021-08-07 18:46:53 +02:00
Michael Peter Christen
82df012442 removed old lib 2021-08-07 18:23:22 +02:00
Michael Peter Christen
8a2adb2b15 upgraded commons-compress lib
cause: alert in
https://github.com/yacy/yacy_search_server/security/dependabot/pom.xml/org.apache.commons:commons-compress/open
2021-08-07 18:21:54 +02:00
Michael Peter Christen
9182b3dfca enhanced default value 2021-08-05 09:18:05 +02:00
Michael Peter Christen
294d56d4a2 addressing better GC behavior after removing Xms with earlier heap increase strategy 2021-08-05 09:16:52 +02:00
Michael Peter Christen
3959d43a5c fixed doku link 2021-08-03 16:57:24 +02:00
Michael Peter Christen
c4659f0fb0 removed Debian and Red Hat build process
as announced in
https://twitter.com/yacy_search/status/1414608643241152516
because of lack of community support for these kind of
distributions. We will still support
tarball, Windows, Mac and Docker releases.
2021-07-19 20:33:52 +02:00
Michael Peter Christen
73360ed52b add gradle to gitignore 2021-07-19 20:12:03 +02:00
Michael Peter Christen
15b7461bc7 removed Xms java memory startup parameter
We will use the default value for now on.
This is much better for resource economy and fits better into a
container/docker/kubernetes strategy.
Furthermore, a small memory footprint is essential for the usage on
small devices like RaspberryPi.
2021-07-19 20:04:11 +02:00