Commit Graph

14486 Commits

Author SHA1 Message Date
Frank Tornack
6864486196
add Yacy vagrant
untested
2024-02-04 19:29:16 +01:00
Michael Christen
331e0a24fc
Merge pull request #621 from OFA54/patch-1
turkish translation tr.lng
2023-12-20 23:34:35 +01:00
Michael Christen
d825a85a01
Merge pull request #619 from pr0vieh/initrecrawl
bring defaults for recrawlindex to init config
2023-12-09 14:47:13 +01:00
pr0vieh
35620762ac bring defaults for recrawlindex to init config 2023-12-09 01:32:31 +01:00
Michael Christen
d097a642c2
Merge pull request #615 from okybaca/logging2
Logging unclutter
2023-12-03 16:40:21 +01:00
Michael Christen
6d5e9ff53f
Merge pull request #616 from okybaca/logging3
changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine
2023-12-03 16:39:29 +01:00
Michael Christen
d5d4e8fe3a
Merge pull request #617 from pr0vieh/master
Add setting for DHT receive loadprereq insted of hardcoded load < 2.0
2023-12-03 16:38:46 +01:00
pr0vieh
dfb2b79609 Add setting for DHT receive loadprereq insted of hardcoded load < 2.0 2023-12-03 01:27:36 +01:00
okybaca
5dee8dbcbd changed the log entry REJECTED to CRAWLER * REJECTED, loglevel fine 2023-12-02 12:24:36 +01:00
Michael Christen
4c603e23f0
Merge pull request #610 from okybaca/cr-text
UI: added a more descriptive message, CitationRank instead of cr
2023-11-27 12:17:05 +01:00
Michael Christen
040cd8be6d
Merge pull request #612 from okybaca/sitemap-fix
updated apache libs
2023-11-27 12:16:43 +01:00
Michael Christen
0233ecd481
Merge pull request #614 from okybaca/logging
added some logging prefixes to yacy.logging
2023-11-27 12:15:34 +01:00
okybaca
7831f294a9 changed regular peerping messages to level fine 2023-11-27 08:12:03 +01:00
okybaca
553c859703 logging: moved some log-cluttering DHT messages to level 'fine' 2023-11-27 07:51:42 +01:00
okybaca
1c5fca9a58 changed network operation log category from YACY to NETWORK 2023-11-26 12:24:09 +01:00
okybaca
2f44fc0257 added some logging prefixes to yacy.logging 2023-11-25 18:39:08 +01:00
OFA
89c2a92cfb
tr.lng 2023-11-18 01:03:28 +03:00
Michael Peter Christen
3d3bdb0f5f added zim importer rule for mdwiki 2023-11-16 23:11:57 +01:00
Michael Peter Christen
4a611ac6a3 another possible fix for
https://github.com/yacy/yacy_search_server/issues/500
2023-11-15 23:45:53 +01:00
okybaca
9c59c6814b updated apache libs 2023-11-15 10:22:00 +01:00
sgaebel
d72cd7916c Merge branch 'master' of https://github.com/yacy/yacy_search_server 2023-11-14 20:43:42 +01:00
sgaebel
0663ae3c99 adds synchornized dumplog 2023-11-14 20:42:00 +01:00
okybaca
cba84632ee UI: added a more descriptive message, CitationRank instead of cr 2023-11-14 00:05:23 +01:00
Michael Peter Christen
cff0991d85 test if this is helpful for https://github.com/yacy/yacy_search_server/issues/500 2023-11-13 16:41:19 +01:00
Michael Peter Christen
ceb07a5218 fixed problem with zim importer which crashed when non-valid urls appeared 2023-11-13 11:12:10 +01:00
Michael Peter Christen
656b3e3e77 updated guava to latest and added missing library for failureaccess 2023-11-13 10:59:49 +01:00
Michael Peter Christen
3268a93019 added a 'minified' option to YaCy dumps 2023-11-13 10:27:50 +01:00
Michael Peter Christen
c20c4b8a21 modified export: added maximum number of docs per chunk
The export file can now be many files, called chunks.
By default still only one chunk is exported.
This function is required in case that the exported files shall be
imported to an elasticsearch/opensearch index. The bulk import function
of elasticsearch/opensearch is limited to 100MB. To make it possible to
import YaCy files, those must be splitted into chunks. Right now we
cannot estimate the chunk size as bytes, only as number of documents.
The user must do experiments to find out the optimum chunk max size,
like 50000 docs per chunk. Try this as first attempt.
2023-11-12 22:11:55 +01:00
Michael Peter Christen
655d8db802 detailed directions in index export to explain how the export can be
imported again using elasticsearch/opensearch
2023-11-12 15:26:18 +01:00
Michael Peter Christen
24011dcbcc more file name extensions for json list surrogate files 2023-11-06 22:44:18 +01:00
Michael Peter Christen
34a9fc1a07 bugfixes to zim reader: 2023-11-05 12:46:37 +01:00
Michael Peter Christen
7db0534d8a Added a zim parser to the surrogate import option.
You can now import zim files into YaCy by simply moving them
to the DATA/SURROGATE/IN folder. They will be fetched and after
parsing moved to DATA/SURROGATE/OUT.
There are exceptions where the parser is not able to identify the
original URL of the documents in the zim file. In that case the file
is simply ignored.
This commit also carries an important fix to the pdf parser and an
increase of the maximum parsing speed to 60000 PPM which should make it
possible to index up to 1000 files in one second.
2023-11-05 02:16:40 +01:00
Michael Peter Christen
70e29937ef added a check in zim importer which tests if import URLs actually exist 2023-11-04 19:07:50 +01:00
Michael Peter Christen
496f768c44 modified cache strategy for zim clusters 2023-11-03 18:20:10 +01:00
Michael Peter Christen
fdc6311dc7 added parsing rules for wikibooks and wikinews in zim reader 2023-11-02 00:27:24 +01:00
Michael Peter Christen
2ea54b3503 fixed blob iterator in zim cluster definition 2023-11-01 23:43:27 +01:00
Michael Peter Christen
54fa5d3c2e added a cluster cache but it requires more testing 2023-11-01 19:52:44 +01:00
Michael Peter Christen
53b01dbf2e Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2023-11-01 18:57:04 +01:00
Michael Peter Christen
41856e9f34 added an optimized zim file entry iterator 2023-11-01 18:50:28 +01:00
Michael Peter Christen
1c0df28bfb added a zim importer that can be used for surrogate imports.
Can not be used yet because it requires some security additions
to verify that the given urls actually work.
2023-11-01 18:48:40 +01:00
Michael Peter Christen
b9912ff50d repaired dockerfiles for aarch64 and armv7 2023-10-29 22:09:24 +00:00
Michael Peter Christen
33b6878ded Merge branch 'master' of https://github.com/yacy/yacy_search_server.git 2023-10-29 14:58:47 +01:00
Michael Christen
68554cea07
Merge pull request #605 from okybaca/readme-docker-link
added a link to docker build guide
2023-10-29 14:56:26 +01:00
Michael Christen
06bfd5802f
Merge pull request #603 from okybaca/dark-green-css
fine tuned the dark-green color scheme
2023-10-29 14:55:58 +01:00
Michael Christen
43d5cd101e
Merge pull request #607 from okybaca/wikilinks
replaced all the links to legacy legacy wiki to legacy wiki
2023-10-29 14:55:26 +01:00
okybaca
4add1f6bc7 replaced all the links to legacy legacy wiki to legacy wiki 2023-10-29 13:12:24 +01:00
Michael Peter Christen
e2c86a8eba added a ZIM cluster pointer cache 2023-10-29 12:49:08 +01:00
Michael Peter Christen
4a54b24703 fix for "negative seek offset" error during extension of heap files.
This would have always happend when a heap file exceeds 2GB.
should fix https://github.com/yacy/yacy_search_server/issues/372
2023-10-29 09:32:21 +01:00
okybaca
69db75ce45 added a link to docker build guide 2023-10-29 02:35:57 +01:00
Michael Peter Christen
9c8fb97985 introduced url list and title list caching and enhanced input stream
performance in ZIM reader
2023-10-29 00:43:12 +02:00