Michael Peter Christen
309adb814e
fixed import of jsonlist imort from searchlab.eu using a direct URL
2022-10-25 00:51:53 +02:00
Michael Peter Christen
62d177bf59
stub for jsonlist index importer web page
2022-10-23 12:22:31 +02:00
Michael Peter Christen
efa0425f00
refactoring: moved jsonlist importer to importer class
2022-10-23 11:35:32 +02:00
Michael Peter Christen
49daa32a88
yacy can now read searchlab export dump files
...
using the surrogate input process:
- copy the searchlab export file to DATA/SURROGATE/in
- the file is processed automatically and then moved to
DATA/SURROGATE/OUT
2022-10-23 11:01:58 +02:00
Michael Christen
99174282d8
try to shut down in a bit more ordered way
...
inspired by https://github.com/yacy/yacy_search_server/issues/518
2022-10-05 22:13:06 +02:00
Michael Peter Christen
482f507e65
upgraded solr from 8.8.1 to 8.9.0
...
should hopefully fix
https://github.com/yacy/yacy_search_server/issues/496
because it includes https://issues.apache.org/jira/browse/SOLR-13034
2022-10-05 17:24:07 +02:00
Michael Peter Christen
60c9986a0e
new release file names with date and git hash
...
...without reference to 9000ish SVN
2022-10-04 15:31:47 +02:00
Michael Peter Christen
9c1bc533fa
removed hazelcast because it is phoning home, see also:
...
https://github.com/yacy/yacy_search_server/issues/504
2022-09-28 17:30:37 +02:00
Michael Peter Christen
fc98ca7a9c
removed ContentControl servlet and functinality
...
This was not used at all (as I know) and was blocking a smooth
integration of ivy in the context of an existing JSON parser.
2022-09-28 17:25:04 +02:00
Michael Peter Christen
3d138d3fdd
catch error when initializing hazelcast
...
should fix https://github.com/yacy/yacy_search_server/issues/468
2022-06-20 17:27:56 +02:00
Burkhard
a6a9828181
Merge pull request #440 from lfuelling/master
...
Add setting for public facing port
2022-02-11 08:09:17 +01:00
Daleth Darko
3ced06c731
Various javadoc fixes
2022-01-26 11:22:43 +01:00
reger24
6a1e259fd0
Fix NPE in Switchboard . getURL https://github.com/yacy/yacy_search_server/issues/441
2022-01-26 06:07:38 +01:00
Lukas Fülling
e8a00007f6
add setting for public facing port
2022-01-11 17:10:48 +01:00
Michael Peter Christen
bd3f2483a1
replaced url and date retrieval by only url retrieval
...
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90
replaced check for load time method
...
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
Michael Peter Christen
be0aebad84
fixes https://github.com/yacy/yacy_search_server/issues/424
2021-10-04 14:38:49 +02:00
Michael Peter Christen
63ad8ce6b2
removed ymarks
...
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
ef5a71a592
enhanced crawl start response time
...
for very very large crawl start lists
2021-09-16 21:01:01 +02:00
Michael Peter Christen
e9c5e78868
replaced new Number(Number) with Number.instanceOf
...
to remove deprecation warnings for Java 9
2021-08-08 00:39:03 +02:00
Michael Peter Christen
e81b770f79
enabled crawl starts with very large sets of start urls
...
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
1cdb21592b
added hazelcast and some modifications to align legacy YaCy with
...
YaCyGrid
2021-04-15 20:39:22 +02:00
Michael Peter Christen
8f876a8c72
added concurrency to enhance indexing speed during json surrogate import
2021-03-30 12:07:36 +02:00
Michael Peter Christen
f8cbaeef93
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2021-03-29 18:46:53 +02:00
Michael Peter Christen
a857e3d3d5
fix for json importer
2021-03-29 18:46:42 +02:00
sgaebel
c69c462a15
replaces a expensive getLoadTimeURL() by exists()
...
refactors urlExists to getHarvestProcess as that is what it does
2021-03-20 15:01:31 +01:00
sgaebel
26223dc25a
replaces getLoadTime() by exists() with a simpler query
...
since solr-8.8.1 getLoadTime() causes a high cpu usage
2021-03-20 01:06:02 +01:00
Michael Peter Christen
8b4394a6c5
fixes for solr 8.8.1 migration
...
- replace new guava 30 with older 25 because that is the correct
dependency for solr 8.8.1. The newer one did actually not work!
- index will be crated in a DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1
subfolder. The older solr_6_6 index is not touched but also not
migrated. The index starts with fresh (empty) content.
- Older indexes must be migrated by hand (export/import) so far until a
better solution is found.
- Large schema adoptions for lucene 8.8.1
2021-03-08 13:39:27 +01:00
Al Sutton
69014a701e
Update API Usage
2021-03-04 16:14:56 +00:00
Michael Peter Christen
13a2e6dc6e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
2021-01-25 11:49:32 +01:00
Michael Peter Christen
0ae8ccf657
Make it possible to set an empty password disabling the authentication
...
protocol completely
If you set now an empty password, then the http server will not ask to
authentify. This is required for environment where we attach an outside
authentification service like keycloak or similar using authentication
in an ingress proxy.
This change is part of the approach to run YaCy inside of a kubernetes
cluster where we do not want individual authentication of peers and want
to apply a ingress authentication.
2021-01-25 11:49:21 +01:00
Michael Peter Christen
96592a10cf
added option to set yacy configuration values using environment
...
variables
To use that feature, set an environment variable with prefix "yacy." and
suffix identical to the yacy configuration attribute name.
Additionaly we implemented a way to set a peer name using the setting
"network.unit.agent". This can therefore now be used to set a peer name
with the java call parameter
-Dyacy.network.unit.agent=anonymous
The purpose for this feature is the ability to set peer names in
mass-deployed kubernetes clusters to the same name to prevent that we
are flooding peer name statistics with auto-deployment-generated names.
2021-01-24 22:50:37 +01:00
Michael Peter Christen
198826c362
added network scanner process to discover all YaCy peers in the intranet
...
this will be used to wire YaCy peers in a kubernetes cluster
2021-01-23 15:14:49 +01:00
Michael Peter Christen
907f121d0c
do not overwrite PW with random PW
2020-12-29 20:18:25 +01:00
Michael Peter Christen
3e6a1e0a49
fixed surrogate process counter
2020-12-28 18:26:22 +01:00
Michael Peter Christen
baad56d83d
beautified default peer names
2020-12-14 02:08:49 +01:00
Michael Peter Christen
43a9f4f574
updated solr 6.6.6 -> 7.7.3
...
dropped GSA support (GSA API is still in YaCy Grid)
The 6.6.6 solr index works without migration also with 7.7.3
2020-12-12 02:06:43 +01:00
Michael Peter Christen
c0d9a3e9a7
turned HostBrowser into a admin-only page, now called IndexBrowser
...
This was required because spiders and bots crawled through this page and
created load on the peer without use for the user or the YaCy network.
2020-12-11 00:50:52 +01:00
Michael Peter Christen
6271e9122c
javadoc fix
2020-12-09 02:22:47 +01:00
Michael Peter Christen
52228cb6be
added a gc to cleanup process (once every 10 minutes)
2020-12-02 00:13:00 +01:00
Michael Peter Christen
22841ffbf1
creating a threaddump during every cleanup process
...
to be able to find out what a peer did (not) last time before a crash
2020-12-01 03:00:24 +01:00
sgaebel
3431f91db9
removes unused 'unused' tokens
2020-08-04 20:09:34 +02:00
sgaebel
fc03c4b4fe
removes some warning and unused objects
2020-08-03 20:44:31 +02:00
sgaebel
4a495df63a
removes some deprecation-warnings
2020-07-31 17:28:06 +02:00
sgaebel
dd9d4b1188
replace org.junit.Assert.assertThat by
...
org.hamcrest.MatcherAssert.assertThat from hamcrest 2.2 to avoid
deprecation-warning
2020-07-28 19:09:26 +02:00
Michael Peter Christen
e0ad8ca9da
replaced json library from JSON.org with libandroid-json-java
...
This fixes https://github.com/yacy/yacy_search_server/issues/347
2020-04-24 11:45:25 +02:00
Michael Christen
cfa27d2fd5
fixed links
2019-10-20 20:20:50 +02:00
luccioman
6b45cd5799
New optional crawl filter on the URL a doc must match to crawl its links
...
For finer control over which parsed documents can trigger an addition of
their links to the crawl stack, complementary to the existing crawl
depth parameter.
2019-05-01 08:54:19 +02:00
luccioman
a5771b1f14
Made SNI extension user configurable without the need for server restart
...
TLS Server Name Indication (SNI) extension activation can now be
configured with the new Settings_p.html?page=httpClient administration
page.
SNI extension is also now enabled by default, as in 2019 the
unrecognized_name(112) alert is more properly handled by major web
servers TLS implementations, following the RFC 6066 standard.
Related YaCy issues : #153 #189 and #272
JDK 1.7 bug :
https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7127374
Apache httpd issue :
https://bz.apache.org/bugzilla/show_bug.cgi?id=56241
RFC 6066 : https://tools.ietf.org/html/rfc6066#section-3
2019-04-14 15:41:13 +02:00
luccioman
e90405b6f0
Support parsing audio URLs without file extension
...
Added also a Junit for the audio tag parser
2019-04-09 11:40:21 +02:00