Commit Graph

1497 Commits

Author SHA1 Message Date
Michael Peter Christen
910a496c9f replaced http links with https
Some checks failed
CI Script to build on self-hosted server / build (push) Has been cancelled
2024-07-21 18:02:58 +02:00
Michael Peter Christen
687820788d this assert does not work because of the 9_0_0 solr version format.
An 9_0 is expected but it does not work this way with this version.
2024-07-21 13:33:47 +02:00
Michael Peter Christen
f1c70dce33 Merge branch 'master' of github.com:yacy/yacy_search_server 2024-05-19 17:35:24 +02:00
Michael Peter Christen
8eb0d490aa migrated solr to 9.0
This is a major step because solr removed support for embedded solr
instances in 9.0 and we want to keep it because we want to ship
YaCy with an embedded solr. It was necessary to add parts of solr
code into YaCy to make this migration possible. Further on with
Solr 9.1 they removed even more parts which are required for embedded
operation, therefore we cannot migrate yet further without big
changes.
If you are running a YaCy instance with Solr 8.x, the migration should
be done automatically. If not you require to first migrate to a YaCy
version 1.93 with Solr 8.x to migrate to Solr 8 data.
2024-05-19 17:34:57 +02:00
Michael Peter Christen
b295e38969 fine-tuned the import process of jsonl files which had been missing
to actually be able to make searches and browse the index with the host
browser
2024-05-10 12:13:44 +02:00
Michael Christen
d097a642c2
Merge pull request #615 from okybaca/logging2
Logging unclutter
2023-12-03 16:40:21 +01:00
Michael Christen
6d5e9ff53f
Merge pull request #616 from okybaca/logging3
changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine
2023-12-03 16:39:29 +01:00
pr0vieh
dfb2b79609 Add setting for DHT receive loadprereq insted of hardcoded load < 2.0 2023-12-03 01:27:36 +01:00
okybaca
5dee8dbcbd changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine 2023-12-02 12:24:36 +01:00
Michael Christen
4c603e23f0
Merge pull request #610 from okybaca/cr-text
UI: added a more descriptive message, CitationRank instead of cr
2023-11-27 12:17:05 +01:00
okybaca
553c859703 logging: moved some log-cluttering DHT messages to level 'fine' 2023-11-27 07:51:42 +01:00
sgaebel
d72cd7916c Merge branch 'master' of https://github.com/yacy/yacy_search_server 2023-11-14 20:43:42 +01:00
sgaebel
0663ae3c99 adds synchornized dumplog 2023-11-14 20:42:00 +01:00
okybaca
cba84632ee UI: added a more descriptive message, CitationRank instead of cr 2023-11-14 00:05:23 +01:00
Michael Peter Christen
3268a93019 added a 'minified' option to YaCy dumps 2023-11-13 10:27:50 +01:00
Michael Peter Christen
c20c4b8a21 modified export: added maximum number of docs per chunk
The export file can now be many files, called chunks.
By default still only one chunk is exported.
This function is required in case that the exported files shall be
imported to an elasticsearch/opensearch index. The bulk import function
of elasticsearch/opensearch is limited to 100MB. To make it possible to
import YaCy files, those must be splitted into chunks. Right now we
cannot estimate the chunk size as bytes, only as number of documents.
The user must do experiments to find out the optimum chunk max size,
like 50000 docs per chunk. Try this as first attempt.
2023-11-12 22:11:55 +01:00
Michael Peter Christen
24011dcbcc more file name extensions for json list surrogate files 2023-11-06 22:44:18 +01:00
Michael Peter Christen
7db0534d8a Added a zim parser to the surrogate import option.
You can now import zim files into YaCy by simply moving them
to the DATA/SURROGATE/IN folder. They will be fetched and after
parsing moved to DATA/SURROGATE/OUT.
There are exceptions where the parser is not able to identify the
original URL of the documents in the zim file. In that case the file
is simply ignored.
This commit also carries an important fix to the pdf parser and an
increase of the maximum parsing speed to 60000 PPM which should make it
possible to index up to 1000 files in one second.
2023-11-05 02:16:40 +01:00
Michael Peter Christen
4308aa5415 removed concept of empty passwords as "no passwords used",
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
2023-10-25 22:56:06 +02:00
Michael Peter Christen
4da320bebf added a warning message in ConfigBasic in case that the default password
was not changed.
2023-10-24 23:36:26 +02:00
Michael Peter Christen
ff8fe7b6a4 fix for ',' or '.' appearing within a word or number. This will not
tokenize the query into parts around that character to make it possible
to search for numbers or version numbers.
2023-09-03 11:37:25 +02:00
Michael Peter Christen
88cd17ea57 migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon. 2023-09-01 18:24:52 +02:00
Michael Peter Christen
1c0f50985c fixed documentation and some details of handling of keywords 2023-04-04 12:41:12 +02:00
Michael Peter Christen
9fcd8f1bda added canonical filter
attention: this is on by default!
(it should do the right thing)
2023-01-16 14:50:30 +01:00
Michael Christen
4304e07e6f crawl profile adoption to new tag valency attribute 2023-01-15 01:20:12 +01:00
Michael Peter Christen
309adb814e fixed import of jsonlist imort from searchlab.eu using a direct URL 2022-10-25 00:51:53 +02:00
Michael Peter Christen
62d177bf59 stub for jsonlist index importer web page 2022-10-23 12:22:31 +02:00
Michael Peter Christen
efa0425f00 refactoring: moved jsonlist importer to importer class 2022-10-23 11:35:32 +02:00
Michael Peter Christen
49daa32a88 yacy can now read searchlab export dump files
using the surrogate input process:
- copy the searchlab export file to DATA/SURROGATE/in
- the file is processed automatically and then moved to
DATA/SURROGATE/OUT
2022-10-23 11:01:58 +02:00
Michael Christen
99174282d8 try to shut down in a bit more ordered way
inspired by https://github.com/yacy/yacy_search_server/issues/518
2022-10-05 22:13:06 +02:00
Michael Peter Christen
482f507e65 upgraded solr from 8.8.1 to 8.9.0
should hopefully fix
https://github.com/yacy/yacy_search_server/issues/496
because it includes https://issues.apache.org/jira/browse/SOLR-13034
2022-10-05 17:24:07 +02:00
Michael Peter Christen
60c9986a0e new release file names with date and git hash
...without reference to 9000ish SVN
2022-10-04 15:31:47 +02:00
Michael Peter Christen
9c1bc533fa removed hazelcast because it is phoning home, see also:
https://github.com/yacy/yacy_search_server/issues/504
2022-09-28 17:30:37 +02:00
Michael Peter Christen
fc98ca7a9c removed ContentControl servlet and functinality
This was not used at all (as I know) and was blocking a smooth
integration of ivy in the context of an existing JSON parser.
2022-09-28 17:25:04 +02:00
Michael Peter Christen
3d138d3fdd catch error when initializing hazelcast
should fix https://github.com/yacy/yacy_search_server/issues/468
2022-06-20 17:27:56 +02:00
Burkhard
a6a9828181
Merge pull request #440 from lfuelling/master
Add setting for public facing port
2022-02-11 08:09:17 +01:00
Daleth Darko
3ced06c731 Various javadoc fixes 2022-01-26 11:22:43 +01:00
reger24
6a1e259fd0 Fix NPE in Switchboard . getURL https://github.com/yacy/yacy_search_server/issues/441 2022-01-26 06:07:38 +01:00
Lukas Fülling
e8a00007f6 add setting for public facing port 2022-01-11 17:10:48 +01:00
Michael Peter Christen
bd3f2483a1 replaced url and date retrieval by only url retrieval
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90 replaced check for load time method
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
Michael Peter Christen
be0aebad84 fixes https://github.com/yacy/yacy_search_server/issues/424 2021-10-04 14:38:49 +02:00
Michael Peter Christen
63ad8ce6b2 removed ymarks
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
ef5a71a592 enhanced crawl start response time
for very very large crawl start lists
2021-09-16 21:01:01 +02:00
Michael Peter Christen
e9c5e78868 replaced new Number(Number) with Number.instanceOf
to remove deprecation warnings for Java 9
2021-08-08 00:39:03 +02:00
Michael Peter Christen
e81b770f79 enabled crawl starts with very large sets of start urls
i.e. 10MB large url list with approx 0.5 million start points
2021-06-30 10:45:58 +02:00
Michael Peter Christen
1cdb21592b added hazelcast and some modifications to align legacy YaCy with
YaCyGrid
2021-04-15 20:39:22 +02:00
Michael Peter Christen
8f876a8c72 added concurrency to enhance indexing speed during json surrogate import 2021-03-30 12:07:36 +02:00
Michael Peter Christen
f8cbaeef93 Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2021-03-29 18:46:53 +02:00
Michael Peter Christen
a857e3d3d5 fix for json importer 2021-03-29 18:46:42 +02:00