Commit Graph

8973 Commits

Author SHA1 Message Date
Michael Peter Christen
b0ae660790 added Zstandard compressed data decompression for ZIM files type 5
also: more generalization and performance enhancements
2023-10-28 12:24:29 +02:00
Michael Peter Christen
ad8ee3a0b6 fixed typo in class name 2023-10-28 08:57:42 +02:00
Michael Peter Christen
c4082c4ff2 refactoring of ZIM reader, simplification, removed unnecessary code 2023-10-28 08:56:58 +02:00
Michael Peter Christen
c2b6b6e7b9 Fixed a large number of problems in the ZIM reader.
This library was not prepared for large data because it was missing long
data types for pointers. I had to modify the code-base in a fundamental
way:
- Proof-Reading,
- unclustering,
- refactoring,
- naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format,
- change of Exception handling,
- extension to more attributes as defined in spec (bugfix for mime type
loading)
- bugfix to long parsing (prevented reading of large files)
The code is furthermore very inefficient and requires more attention.
However the format is very useful for YaCy as there are numerous data
sources for ZIM-Files.
2023-10-27 15:49:23 +02:00
Michael Peter Christen
5ba5fb5d23 upgraded pdfbox to 3.0.0 2023-10-27 12:05:24 +02:00
Michael Peter Christen
1fefae9baf integrated the source code of a openzim file format reader. These are
the raw format reader files with no integration in YaCy yet, which will
maybe follow as a next step. The zim file format is documented in
https://openzim.org and the reader code was taken from the archived,
non-maintained repository at https://github.com/openzim/zimreader-java
2023-10-27 10:59:06 +02:00
Michael Peter Christen
4308aa5415 removed concept of empty passwords as "no passwords used",
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
2023-10-25 22:56:06 +02:00
Michael Peter Christen
2c60ff14bb fixed default pw comparison 2023-10-25 13:59:02 +02:00
Michael Peter Christen
4da320bebf added a warning message in ConfigBasic in case that the default password
was not changed.
2023-10-24 23:36:26 +02:00
Michael Peter Christen
7830268be1 fix 756c817b5a
must be applied to all code where a transaction token is generated.
2023-10-21 13:00:49 +02:00
Michael Peter Christen
756c817b5a fix for https://github.com/yacy/yacy_search_server/issues/544 2023-10-21 11:45:26 +02:00
Michael Peter Christen
03bf259601 fix for https://github.com/yacy/yacy_search_server/issues/363
We still need to set the load in the process because a demand for higher
crawl speed may require to increase the maximum load limit. However,
following the criticism in the bug, we do never reduce the load limit
again.
2023-10-16 18:26:47 +02:00
mchristen
8fc51f66c6 fixed a test class which prevented compilation on latest jvm 2023-09-26 15:39:34 +02:00
Joel Strasser
53bafa1544
consistent formatting in string concatenation 2023-09-25 23:31:55 +02:00
Joel Strasser
22c4188001
additionally match release stub for YaCy version 2023-09-25 22:41:04 +02:00
Michael Peter Christen
ff8fe7b6a4 fix for ',' or '.' appearing within a word or number. This will not
tokenize the query into parts around that character to make it possible
to search for numbers or version numbers.
2023-09-03 11:37:25 +02:00
Michael Peter Christen
0689f4f0ae Check if the character is a minus sign and is followed by a letter or a
digit. Treat it as part of the word/number.
2023-09-03 10:22:03 +02:00
Michael Peter Christen
5db97a8928 parser can now separate numbers from words also when they are not
separated by space, i.e. 4.7Ohm
2023-09-02 19:15:22 +02:00
Michael Peter Christen
e3797de7de enhanced the word tokenizer to recognize numbers in a proper way 2023-09-01 20:10:08 +02:00
Michael Peter Christen
88cd17ea57 migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon. 2023-09-01 18:24:52 +02:00
Michael Peter Christen
0089f234f4 added npe protection 2023-09-01 12:18:47 +02:00
Michael Peter Christen
8285fe715a tab to spaces for classes supporting the condenser.
This is a preparation step to make changes in condenser and parser more
visible; no functional changes so far.
2023-09-01 11:00:42 +02:00
Michael Peter Christen
195bd2e444 extended the maximum header size to 16k to prevent http error 431 2023-08-19 15:21:24 +02:00
Michael Peter Christen
92dad3ed49 removed 7Zip parser because the old library could not be replaced by a maven repository 2023-07-27 23:11:27 +02:00
Michael Peter Christen
5afcba162b updated libraries 2023-07-27 22:55:46 +02:00
Michael Christen
a348146d8f setting connect host to 0.0.0.0 2023-06-29 10:46:05 +02:00
Michael Peter Christen
1c0f50985c fixed documentation and some details of handling of keywords 2023-04-04 12:41:12 +02:00
Michael Christen
3472bcb4d3 patched a 'java.lang.NoSuchMethodError: com.twelvemonkeys.imageio.util.IIOUtil.lookupProviderByName' problem which occurred only on ARM 2023-03-05 01:17:28 +01:00
Michael Christen
f7b6e98ed7
Merge pull request #562 from thkoch2001/fix-warnings
Fix warnings
2023-03-05 00:56:04 +01:00
Michael Peter Christen
a157d01bb5 increased network image size limit for linuxtage poster 2023-02-24 17:50:29 +01:00
Thomas Koch
6bca836f49 fix 3 javac warnings: redundant cast
see GitHub issue #561 for context

    [javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigAccounts_p.java:85: warning: [cast] redundant cast to YaCyHttpServer
    [javac]                 final YaCyHttpServer jhttpserver = (YaCyHttpServer)sb.getHttpServer();
    [javac]                                                    ^
    [javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:156: warning: [cast] redundant cast to YaCyHttpServer
    [javac]                 final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
    [javac]                                                    ^
    [javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:167: warning: [cast] redundant cast to YaCyHttpServer
    [javac]             final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
2023-02-11 17:17:46 +02:00
Michael Christen
9012fe4519 extended error message 2023-01-23 09:08:25 +01:00
Michael Christen
74104ff2d3 fix to timeout 2023-01-20 20:22:14 +01:00
Michael Peter Christen
9fcd8f1bda added canonical filter
attention: this is on by default!
(it should do the right thing)
2023-01-16 14:50:30 +01:00
Michael Peter Christen
5a52b01c09 front-end integration of tag valency 2023-01-15 20:13:45 +01:00
Michael Peter Christen
7f728bb4b4 crawl profile storage extension for tag valency 2023-01-15 14:11:32 +01:00
Michael Christen
4304e07e6f crawl profile adoption to new tag valency attribute 2023-01-15 01:20:12 +01:00
Michael Peter Christen
5acd98f4da introduction of tag-to-indexing relation TagValency 2023-01-13 17:20:18 +01:00
Michael Peter Christen
ab3ef87abf fixed exec start command where a path contains spaces 2022-12-05 17:30:11 +01:00
Michael Peter Christen
17eec667fb better release number representation 2022-12-05 14:46:58 +01:00
Michael Peter Christen
b1199e97f8 enabling new update location release.yacy.net
with new version numbers
2022-12-05 14:26:17 +01:00
Michael Peter Christen
66169d1aad default build properties to remove barrier developing in IDE
environments
2022-12-05 12:28:36 +01:00
Michael Peter Christen
309adb814e fixed import of jsonlist imort from searchlab.eu using a direct URL 2022-10-25 00:51:53 +02:00
Michael Peter Christen
5ddc794bb9 code cleanup in http clieant 2022-10-24 23:34:39 +02:00
Michael Peter Christen
62d177bf59 stub for jsonlist index importer web page 2022-10-23 12:22:31 +02:00
Michael Peter Christen
efa0425f00 refactoring: moved jsonlist importer to importer class 2022-10-23 11:35:32 +02:00
Michael Peter Christen
49daa32a88 yacy can now read searchlab export dump files
using the surrogate input process:
- copy the searchlab export file to DATA/SURROGATE/in
- the file is processed automatically and then moved to
DATA/SURROGATE/OUT
2022-10-23 11:01:58 +02:00
Michael Peter Christen
6042dd99c6 reduced danger that Tray does not initialize 2022-10-06 00:01:42 +02:00
Michael Christen
61b27217b9 throttle number of DNS requests:
as soon as the number of requests is > 50, there is a forced delay
of (10 * (requests - 50)) milliseconds. That means that once the number
of DNS requests reach 150, there is a one second delay to each request.

This shall prevent that a remote DNS is flooded with request and
possibly gets damaged.
This is also a fix/enhancement for
https://github.com/yacy/yacy_search_server/issues/513
2022-10-05 22:59:09 +02:00
Michael Christen
99174282d8 try to shut down in a bit more ordered way
inspired by https://github.com/yacy/yacy_search_server/issues/518
2022-10-05 22:13:06 +02:00
Michael Peter Christen
482f507e65 upgraded solr from 8.8.1 to 8.9.0
should hopefully fix
https://github.com/yacy/yacy_search_server/issues/496
because it includes https://issues.apache.org/jira/browse/SOLR-13034
2022-10-05 17:24:07 +02:00
Michael Peter Christen
d49f937b98 added iso,apk,dmg to extension-deny list
see also https://github.com/yacy/yacy_search_server/issues/510
zip is not on the list because it can be parsed
2022-10-05 16:28:50 +02:00
Michael Peter Christen
761dbdf06d increases log history length to 10000
implements https://github.com/yacy/yacy_search_server/issues/512
2022-10-05 16:09:28 +02:00
Michael Peter Christen
0970a79bbf attempt to fix https://github.com/yacy/yacy_search_server/issues/517 2022-10-05 15:29:59 +02:00
Michael Peter Christen
1893661ee4 removed/suppressed more warnings 2022-10-05 14:38:59 +02:00
Michael Christen
51cf17d252 removed warnings 2022-10-04 22:28:15 +02:00
Michael Christen
867f96a32b removed warnings 2022-10-04 22:05:32 +02:00
Michael Christen
8a06beaf24 removed finalize() methods, deprecated 2022-10-04 20:12:47 +02:00
Michael Peter Christen
60c9986a0e new release file names with date and git hash
...without reference to 9000ish SVN
2022-10-04 15:31:47 +02:00
Michael Christen
8b37a5dc6f removed log4j properties because we don't have a log4j any more 2022-10-03 10:44:03 +02:00
Michael Christen
347b676b76 changed system to load build properties 2022-10-03 10:12:47 +02:00
Michael Christen
c36bdbf78d refactoring 2022-10-03 09:37:16 +02:00
Michael Peter Christen
1e1107c97c clean-up and new servlet method caching 2022-10-02 23:39:00 +02:00
Michael Peter Christen
adbda4c71b moved all remaining servlet classes to new location 2022-10-02 23:22:12 +02:00
Michael Peter Christen
33889b4501 moved more servlets to new location 2022-10-02 22:57:58 +02:00
Michael Peter Christen
6d388bb7bf refactoring - moved htroot/yacy classes 2022-10-02 22:26:53 +02:00
Michael Peter Christen
48fcf3b3b5 alternative servlet method, tested with wiki
may become the future method to store servlets
2022-09-30 18:29:01 +02:00
Michael Peter Christen
d23dea2642 refactoring 2022-09-30 17:42:21 +02:00
Michael Peter Christen
23f1dc3741 addressing/fixing some concurrency issues from
https://github.com/yacy/yacy_search_server/issues/505
2022-09-30 08:01:13 +02:00
Michael Peter Christen
9c1bc533fa removed hazelcast because it is phoning home, see also:
https://github.com/yacy/yacy_search_server/issues/504
2022-09-28 17:30:37 +02:00
Michael Peter Christen
fc98ca7a9c removed ContentControl servlet and functinality
This was not used at all (as I know) and was blocking a smooth
integration of ivy in the context of an existing JSON parser.
2022-09-28 17:25:04 +02:00
Thomas Koch
3116713672 rm buildDate from build.xml and its usages
The https://reproducible-builds.org project invests a lot of work
to make builds reproducible. This is a security property. It allows
to compare the build of binaries from different builder machines.
If they are identical, it means that either the builds have not
been manipulated or an attacker managed to attack all builder
machines in exactly the same way.

One problem that the reproducible-builds project often sees is
that projects include the build time in their binaries. This
makes builds unreproducible for apparently no reason. The build
date should not be of interest since binaries built on different
dates but from the same source code should not be different.

Thus I decided to remove the build date instead of re-implementing
the functionality without the GitRev task. Anyways the reported
date was not the build date but the date of the last git commit
which is even less informative. The git commit ID would have
information value but should only be relevant for "nightly builds".
2022-07-10 11:32:38 +00:00
Thomas Koch
572558244a rm unused build properties PKGMANAGER, RESTARTCMD, DESTDIR
PKGMANAGER is always false, thus the java code wrapped in
if statements for this property is dead code and can also
be removed.

The Debian packaging removed in c4659f0fb0
did set the PKGMANAGER property to true. When we do distro
packages again, we can revisit this commit and redo it with
property files instead.

RESTARTCMD is only used inside those dead code.

DESTDIR is never used even in the build.xml
2022-07-10 10:14:51 +00:00
Michael Peter Christen
3d138d3fdd catch error when initializing hazelcast
should fix https://github.com/yacy/yacy_search_server/issues/468
2022-06-20 17:27:56 +02:00
Burkhard
a6a9828181
Merge pull request #440 from lfuelling/master
Add setting for public facing port
2022-02-11 08:09:17 +01:00
reger24
141e86964e Fix compile deprecation warning
warning: [removal] AccessControlException in java.security has been deprecated and marked for removal
2022-02-11 00:27:55 +01:00
reger24
a7e93d9328 Add option to add host to default blacklist from search result
- added authorized ikon/button to blacklist a host
- host is added to default blacklist
- insired by https://github.com/yacy/yacy_search_server/issues/213#issuecomment-412485190
2022-02-09 19:42:04 +01:00
reger24
027e284ef9 Enhance notability of current blacklist by diff color in header
in servlet Blacklist_p.html
bugfix for 18dddb74c9
2022-02-06 09:43:59 +01:00
reger24
18dddb74c9 Harmonize loading/reading blacklist
between init  and servlet to use the same procedures
-added BlacklistHelper.blacklistToSortedArray to simplify use in servlet
2022-02-06 00:10:55 +01:00
reger24
f28d705cd0 update IndexBroser_p add to blacklist button
add feedback to user on success
2022-02-03 03:25:13 +01:00
Michael Peter Christen
52fe2ed8ba Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2022-02-01 04:21:55 +01:00
Michael Peter Christen
39e7bbac13 removed deprecation warning for new Double() 2022-02-01 04:20:55 +01:00
reger24
6a5f0b3684 Servlet IndexBroser_p add button "Add to blacklist"
allows to add the displayed host to add to the default blacklist
2022-01-30 21:01:23 +01:00
Lukas Fülling
111cf48642 add missing prop 2022-01-29 19:28:57 +01:00
reger24
f33e0ed7fd revert commit 17fd1a4616
wrong file selected
2022-01-29 12:18:07 +01:00
unknown
17fd1a4616 delete .idea not needed in distribution
.idea is created locally by IntelliJ IDEA upon import as gradle project to store IDEA specific settings.
No need to include in distribution
2022-01-29 10:45:37 +01:00
Daleth Darko
3ced06c731 Various javadoc fixes 2022-01-26 11:22:43 +01:00
reger24
6a1e259fd0 Fix NPE in Switchboard . getURL https://github.com/yacy/yacy_search_server/issues/441 2022-01-26 06:07:38 +01:00
reger24
eae16287e9 Added epub (ebook) format to existing zipParser
*.epub files are zip files containing xhtml files with content and other artifact files,
which the zipParser can  already feed to index
- extension "epub"
- mime "epub+zip"
2022-01-24 13:51:27 +01:00
reger24
3e34f7c596 Import Ant build.xml into Gradle and use old compile of servlets in Gradle
to be able to use/reuse Ant targets where task has not been implemented in Gradle build.
- use the import to include the compile of htroot as first important task

  ! it is possible that first build fails an compile of GitRevTask.jar !
  ! solution/workaround -> use "ant all" once to compile GitRevTask.jar !

- adjusted build.xml a little
   - split compile-core into compile-core and compile-htroot to have a target for htroot comp. only
   - set build-path to reuse Gradles build directory
   - (fix javadoc failure)

- changed the filtered-copy of yacyBuildProperties.java to ! the build path :-(
  as current (copy,delete,exclude) is complicated and not migration worthy,
  used simple/straigt forward approach (using a yacyBuildProperties.java.template file as copy source)
2022-01-18 20:00:55 +01:00
reger24
398b105781 Prevent that YaCy always starts with a exception message on none Apple systems
Perform try to access com.apple.eio.FileManager  only on none Win systems
2022-01-18 13:02:12 +01:00
Lukas Fülling
e8a00007f6 add setting for public facing port 2022-01-11 17:10:48 +01:00
Michael Peter Christen
d7b17d8935 fixed missing thread name revert after balancer waiting 2021-12-22 01:46:18 +01:00
Michael Peter Christen
bd3f2483a1 replaced url and date retrieval by only url retrieval
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90 replaced check for load time method
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
Michael Peter Christen
1ead7b85b5 remove compiler warning
"warning: [try] explicit call to close() on an auto-closeable resource"
2021-12-13 12:28:34 +01:00
Michael Peter Christen
59777010dc Merge branch 'master' of git@github.com:yacy/yacy_search_server.git 2021-11-18 00:49:56 +01:00
Michael Peter Christen
7898815c41 disabling concurrent logging
(maybe temporary)
2021-11-18 00:49:46 +01:00
sgaebel
4bf6954474 uses clientBuilder not HttpClients.custom() to have these inside the
Pool too
2021-10-31 23:06:33 +01:00
sgaebel
cdf901270c always use HTTPClient by 'try with resources' pattern to free up
resources
2021-10-31 23:06:23 +01:00