Commit Graph

6023 Commits

Author SHA1 Message Date
reger24
8d0f3d4208 Removed the Blacklist_p default=shared for new created blacklists issue https://github.com/yacy/yacy_search_server/issues/374
Is nice to be able to import blacklist from other peers
but shared by default is likely not intentional choosen by user
2022-01-26 14:31:52 +01:00
Andreas
590f39b403
Add Sorting functionality to Crawler Queue Table
Allow to sort for count and host
2022-01-09 16:06:14 +01:00
ZeroCool940711
7e765b8483 Improved the Image search page to have bigger thumbnails, use a bigger area for results and a smaller left sidebar. 2021-12-26 23:41:04 -07:00
Michael Peter Christen
6fe905bb82 feature https://github.com/yacy/yacy_search_server/issues/434 2021-12-26 23:33:31 +01:00
Michael Peter Christen
9c38b1254e proper deletion of loadtime index 2021-12-22 01:22:46 +01:00
Michael Peter Christen
bd3f2483a1 replaced url and date retrieval by only url retrieval
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90 replaced check for load time method
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
sgaebel
cdf901270c always use HTTPClient by 'try with resources' pattern to free up
resources
2021-10-31 23:06:23 +01:00
Michael Christen
f4834e8e31
link fix 2021-10-29 11:10:23 +02:00
Michael Peter Christen
3c86b7b780 attempt to make a Mac Release using gradle
This is almost working with many workarounds:
- run rm lib/yacycore.jar
- run ./gradlew clean build bundleNative
- run ant clean all
- run again rm lib/yacycore.jar
- run ./fixMacBuild.sh

The build is then inside build/mac/YaCy.app

Right now this works so far but it does not have the correct release
number inside.

Target is to make this working for Windows releases and to embedd jre
entirely.
2021-10-25 18:37:39 +02:00
Michael Peter Christen
63ad8ce6b2 removed ymarks
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
ef5a71a592 enhanced crawl start response time
for very very large crawl start lists
2021-09-16 21:01:01 +02:00
Michael Peter Christen
1bab4ffe20 calculating the correct size of an export.
This can be seen as a fix for
https://github.com/yacy/yacy_search_server/issues/343
however, the export was not flawed, it is just the impression that
something is wrong, but the export size must be smaller than the index
size because the index also containers error documents.
Now an information line is presented that shows i.e.:
"The local index currently contains 181,319 documents, only 106,887
exportable with status code 200 - the remaining are error documents."
2021-09-16 01:05:09 +02:00
admin
9b7668fa58 reduced memory footprint during indexing/crawling 2021-08-24 12:24:52 +02:00
Michael Peter Christen
e6a87e0426 enhanced crawler
a main problem when crawling is long waiting time cuased by crawl-delay
values from robots.txt entries. that attribute is not supported by
google and interpreted by yandex and bing in different ways. In large
crawls there is always one host which blocks the whole crawl with
extreme large values. YaCy now still obeys crawl-delay but limits them
to 10 seconds.
Additionally the blocking logic when loading new robots.txt was analyzed
and a deadlock was removed. Furthermore the construction of new queue
lists was redesigned and it was ensured that always a large list of
different hosts for host-balancing is provided for the loader.
2021-08-17 15:23:21 +02:00
Michael Peter Christen
9182b3dfca enhanced default value 2021-08-05 09:18:05 +02:00
Michael Peter Christen
3959d43a5c fixed doku link 2021-08-03 16:57:24 +02:00
Michael Peter Christen
15b7461bc7 removed Xms java memory startup parameter
We will use the default value for now on.
This is much better for resource economy and fits better into a
container/docker/kubernetes strategy.
Furthermore, a small memory footprint is essential for the usage on
small devices like RaspberryPi.
2021-07-19 20:04:11 +02:00
Michael Peter Christen
4377bd2b70 fix for wrong crawlName construction 2021-06-30 18:03:54 +02:00
Michael Peter Christen
e81b770f79 enabled crawl starts with very large sets of start urls
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
dbd211a1ad removed/replaced reflection in memory tool 2021-04-22 20:24:13 +02:00
Michael Peter Christen
1cdb21592b added hazelcast and some modifications to align legacy YaCy with
YaCyGrid
2021-04-15 20:39:22 +02:00
sgaebel
7fecd859e5 fixes showing metadata from Searchresult, by removing defType=edismax
also removes defType=edismax from IndexBrowser, but still does not show
dates
2021-03-21 00:06:26 +01:00
sgaebel
f16cd154f7 removes unused imports and variables 2021-03-20 15:14:09 +01:00
sgaebel
c69c462a15 replaces a expensive getLoadTimeURL() by exists()
refactors urlExists to getHarvestProcess as that is what it does
2021-03-20 15:01:31 +01:00
sgaebel
26223dc25a replaces getLoadTime() by exists() with a simpler query
since solr-8.8.1 getLoadTime() causes a high cpu usage
2021-03-20 01:06:02 +01:00
Michael Peter Christen
b46513f4a1 added stub of rc3assembly style
a little bit late but whatever
2021-02-09 20:30:10 +01:00
Michael Peter Christen
3da7628117 use environment variables to overwrite configuration variables
you can i.e. do:
export YACY_PORT=8092 && ./startYACY.sh
Just append "YACY_" to uppercase version of environment variables and
replace all "." with "_".
2021-02-09 20:26:49 +01:00
Michael Peter Christen
13a2e6dc6e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2021-01-25 11:49:32 +01:00
Michael Peter Christen
0ae8ccf657 Make it possible to set an empty password disabling the authentication
protocol completely
If you set now an empty password, then the http server will not ask to
authentify. This is required for environment where we attach an outside
authentification service like keycloak or similar using authentication
in an ingress proxy.
This change is part of the approach to run YaCy inside of a kubernetes
cluster where we do not want individual authentication of peers and want
to apply a ingress authentication.
2021-01-25 11:49:21 +01:00
Michael Peter Christen
96592a10cf added option to set yacy configuration values using environment
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
2021-01-24 22:50:37 +01:00
Michael Peter Christen
198826c362 added network scanner process to discover all YaCy peers in the intranet
this will be used to wire YaCy peers in a kubernetes cluster
2021-01-23 15:14:49 +01:00
Michael Peter Christen
d9602e8325 Implemented a new syntax in the template engine to simplify json APIs
Added also an example for one of the existing APIs. The problem is the
comma separator between objects which must not be there for the last
entry in a sequence. The new syntax adds the separator symbol
automatically.
2021-01-18 00:01:08 +01:00
Michael Peter Christen
5a7f12a9c1 allow network scans for non-standard http/https ports 2021-01-11 00:28:24 +01:00
Michael Peter Christen
022fb15670 fix for https://github.com/yacy/yacy_search_server/issues/385 2021-01-06 22:12:17 +01:00
Michael Peter Christen
17672fcbb4 adding hint how to shrink the disk size after an index deletion.
implements https://github.com/yacy/yacy_search_server/issues/360
2021-01-06 22:02:00 +01:00
Michael Peter Christen
907f121d0c do not overwrite PW with random PW 2020-12-29 20:18:25 +01:00
Michael Peter Christen
256fa3d985 new limitation documentation
just replaced two by four
2020-12-22 16:33:12 +01:00
Michael Peter Christen
7997836506 fixed lock image 2020-12-20 23:18:50 +01:00
Michael Peter Christen
d0abb0cedb enabling all crawl profiles in all network modes
also: increased default internet crawl speed to
4 urls/s/host
2020-12-19 01:00:51 +01:00
Michael Peter Christen
a9befbba5f Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2020-12-14 01:26:34 +01:00
Michael Peter Christen
fed8bd6325 automatically refresh css cache when switching skin
and setting of default skin to current skin in selector
2020-12-14 01:26:26 +01:00
Michael Peter Christen
9a5694261a design update
more space
2020-12-12 14:17:45 +01:00
Michael Peter Christen
4ec55289a8 using a lock symbol which looks also good in dark designs 2020-12-12 03:02:40 +01:00
Michael Peter Christen
43a9f4f574 updated solr 6.6.6 -> 7.7.3
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
2020-12-12 02:06:43 +01:00
Michael Peter Christen
c0d9a3e9a7 turned HostBrowser into a admin-only page, now called IndexBrowser
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
2020-12-11 00:50:52 +01:00
Michael Peter Christen
d359d521a1 fixed warc importer
The importer tried to import a gziped files as plain warc.
It will now check the file extension and use a unzip automatically
on-the-fly.
2020-12-10 11:19:25 +01:00
Michael Peter Christen
cef5fde343 adding message to UI to make port change transparent 2020-12-02 18:05:38 +01:00
Michael Peter Christen
22841ffbf1 creating a threaddump during every cleanup process
to be able to find out what a peer did (not) last time before a crash
2020-12-01 03:00:24 +01:00
Michael Peter Christen
d7b2d82faa showing MB instead of KB in PerformanceMemory 2020-11-22 23:02:49 +01:00