Michael Peter Christen
655d8db802
detailed directions in index export to explain how the export can be
...
imported again using elasticsearch/opensearch
2023-11-12 15:26:18 +01:00
Michael Peter Christen
24011dcbcc
more file name extensions for json list surrogate files
2023-11-06 22:44:18 +01:00
Michael Peter Christen
34a9fc1a07
bugfixes to zim reader:
2023-11-05 12:46:37 +01:00
Michael Peter Christen
7db0534d8a
Added a zim parser to the surrogate import option.
...
You can now import zim files into YaCy by simply moving them
to the DATA/SURROGATE/IN folder. They will be fetched and after
parsing moved to DATA/SURROGATE/OUT.
There are exceptions where the parser is not able to identify the
original URL of the documents in the zim file. In that case the file
is simply ignored.
This commit also carries an important fix to the pdf parser and an
increase of the maximum parsing speed to 60000 PPM which should make it
possible to index up to 1000 files in one second.
2023-11-05 02:16:40 +01:00
Michael Peter Christen
70e29937ef
added a check in zim importer which tests if import URLs actually exist
2023-11-04 19:07:50 +01:00
Michael Peter Christen
496f768c44
modified cache strategy for zim clusters
2023-11-03 18:20:10 +01:00
Michael Peter Christen
fdc6311dc7
added parsing rules for wikibooks and wikinews in zim reader
2023-11-02 00:27:24 +01:00
Michael Peter Christen
2ea54b3503
fixed blob iterator in zim cluster definition
2023-11-01 23:43:27 +01:00
Michael Peter Christen
54fa5d3c2e
added a cluster cache but it requires more testing
2023-11-01 19:52:44 +01:00
Michael Peter Christen
53b01dbf2e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
2023-11-01 18:57:04 +01:00
Michael Peter Christen
41856e9f34
added an optimized zim file entry iterator
2023-11-01 18:50:28 +01:00
Michael Peter Christen
1c0df28bfb
added a zim importer that can be used for surrogate imports.
...
Can not be used yet because it requires some security additions
to verify that the given urls actually work.
2023-11-01 18:48:40 +01:00
Michael Peter Christen
b9912ff50d
repaired dockerfiles for aarch64 and armv7
2023-10-29 22:09:24 +00:00
Michael Peter Christen
33b6878ded
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
2023-10-29 14:58:47 +01:00
Michael Christen
68554cea07
Merge pull request #605 from okybaca/readme-docker-link
...
added a link to docker build guide
2023-10-29 14:56:26 +01:00
Michael Christen
06bfd5802f
Merge pull request #603 from okybaca/dark-green-css
...
fine tuned the dark-green color scheme
2023-10-29 14:55:58 +01:00
Michael Christen
43d5cd101e
Merge pull request #607 from okybaca/wikilinks
...
replaced all the links to legacy legacy wiki to legacy wiki
2023-10-29 14:55:26 +01:00
okybaca
4add1f6bc7
replaced all the links to legacy legacy wiki to legacy wiki
2023-10-29 13:12:24 +01:00
Michael Peter Christen
e2c86a8eba
added a ZIM cluster pointer cache
2023-10-29 12:49:08 +01:00
Michael Peter Christen
4a54b24703
fix for "negative seek offset" error during extension of heap files.
...
This would have always happend when a heap file exceeds 2GB.
should fix https://github.com/yacy/yacy_search_server/issues/372
2023-10-29 09:32:21 +01:00
okybaca
69db75ce45
added a link to docker build guide
2023-10-29 02:35:57 +01:00
Michael Peter Christen
9c8fb97985
introduced url list and title list caching and enhanced input stream
...
performance in ZIM reader
2023-10-29 00:43:12 +02:00
Michael Peter Christen
b0ae660790
added Zstandard compressed data decompression for ZIM files type 5
...
also: more generalization and performance enhancements
2023-10-28 12:24:29 +02:00
Michael Peter Christen
ad8ee3a0b6
fixed typo in class name
2023-10-28 08:57:42 +02:00
Michael Peter Christen
c4082c4ff2
refactoring of ZIM reader, simplification, removed unnecessary code
2023-10-28 08:56:58 +02:00
Michael Peter Christen
c2b6b6e7b9
Fixed a large number of problems in the ZIM reader.
...
This library was not prepared for large data because it was missing long
data types for pointers. I had to modify the code-base in a fundamental
way:
- Proof-Reading,
- unclustering,
- refactoring,
- naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format ,
- change of Exception handling,
- extension to more attributes as defined in spec (bugfix for mime type
loading)
- bugfix to long parsing (prevented reading of large files)
The code is furthermore very inefficient and requires more attention.
However the format is very useful for YaCy as there are numerous data
sources for ZIM-Files.
2023-10-27 15:49:23 +02:00
Michael Peter Christen
5ba5fb5d23
upgraded pdfbox to 3.0.0
2023-10-27 12:05:24 +02:00
Michael Peter Christen
c10944bd4a
updated bcmail-jdk15on 1.75 to bcmail-jdk18on 1.67
2023-10-27 11:08:19 +02:00
Michael Peter Christen
1fefae9baf
integrated the source code of a openzim file format reader. These are
...
the raw format reader files with no integration in YaCy yet, which will
maybe follow as a next step. The zim file format is documented in
https://openzim.org and the reader code was taken from the archived,
non-maintained repository at https://github.com/openzim/zimreader-java
2023-10-27 10:59:06 +02:00
okybaca
ec2d14e973
fine tuning the dark-green color scheme
2023-10-26 12:35:22 +02:00
Michael Peter Christen
4308aa5415
removed concept of empty passwords as "no passwords used",
...
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
2023-10-25 22:56:06 +02:00
Michael Peter Christen
2c60ff14bb
fixed default pw comparison
2023-10-25 13:59:02 +02:00
Michael Peter Christen
4da320bebf
added a warning message in ConfigBasic in case that the default password
...
was not changed.
2023-10-24 23:36:26 +02:00
Michael Peter Christen
7830268be1
fix 756c817b5a
...
must be applied to all code where a transaction token is generated.
2023-10-21 13:00:49 +02:00
Michael Peter Christen
dc6f218520
set the default password for the admin account to "yacy"
2023-10-21 12:09:19 +02:00
Michael Peter Christen
756c817b5a
fix for https://github.com/yacy/yacy_search_server/issues/544
2023-10-21 11:45:26 +02:00
Michael Christen
bab1cfc7ea
added required build tools installation
2023-10-20 16:09:47 +02:00
Michael Peter Christen
03bf259601
fix for https://github.com/yacy/yacy_search_server/issues/363
...
We still need to set the load in the process because a demand for higher
crawl speed may require to increase the maximum load limit. However,
following the criticism in the bug, we do never reduce the load limit
again.
2023-10-16 18:26:47 +02:00
Michael Christen
5bc09af426
Merge pull request #600 from okybaca/scheduler-sort
...
UI: modified link to Process Scheduler in left menu
2023-10-16 13:00:24 +02:00
okybaca
4c1eb34e85
modified link to Process Scheduler in left menu
2023-10-10 08:30:04 +02:00
Michael Peter Christen
aeb4c7a660
removed warnings during normal build
2023-10-04 22:00:30 +02:00
Michael Peter Christen
095a444aa7
removed wiki links and added more shields badges
2023-09-30 18:16:38 +02:00
Michael Peter Christen
ca2a21008a
added screenshots
2023-09-30 13:07:18 +02:00
Michael Christen
961d3cc8af
Merge pull request #597 from joestr/issue/574-fix-mac-script
...
Fix macOS script
2023-09-28 21:10:49 +02:00
Michael Christen
a035b21f63
Merge pull request #598 from joestr/improvement/remove-travis-yml
...
Remove .travis.yml
2023-09-28 21:10:04 +02:00
Joel Strasser
b29c0ef133
remove .travis.yml since YaCy is not build on Travis CI anymore
2023-09-27 21:29:22 +02:00
Joel Strasser
09783ae89e
apply patches from @HenryLoenwind
2023-09-27 19:56:08 +02:00
Michael Peter Christen
94db89a757
small remaining changes in readme
2023-09-26 16:15:58 +02:00
Michael Peter Christen
0c4478cd71
migrated jetty to 9.4.52.v20230823
2023-09-26 16:15:42 +02:00
Michael Peter Christen
938724caa8
new development on-boarding process in eclipse with changes for ivy
2023-09-26 16:07:59 +02:00