Michael Peter Christen
b0ae660790
added Zstandard compressed data decompression for ZIM files type 5
...
also: more generalization and performance enhancements
2023-10-28 12:24:29 +02:00
Michael Peter Christen
ad8ee3a0b6
fixed typo in class name
2023-10-28 08:57:42 +02:00
Michael Peter Christen
c4082c4ff2
refactoring of ZIM reader, simplification, removed unnecessary code
2023-10-28 08:56:58 +02:00
Michael Peter Christen
c2b6b6e7b9
Fixed a large number of problems in the ZIM reader.
...
This library was not prepared for large data because it was missing long
data types for pointers. I had to modify the code-base in a fundamental
way:
- Proof-Reading,
- unclustering,
- refactoring,
- naming adoption to https://wiki.openzim.org/wiki/ZIM_file_format ,
- change of Exception handling,
- extension to more attributes as defined in spec (bugfix for mime type
loading)
- bugfix to long parsing (prevented reading of large files)
The code is furthermore very inefficient and requires more attention.
However the format is very useful for YaCy as there are numerous data
sources for ZIM-Files.
2023-10-27 15:49:23 +02:00
Michael Peter Christen
5ba5fb5d23
upgraded pdfbox to 3.0.0
2023-10-27 12:05:24 +02:00
Michael Peter Christen
1fefae9baf
integrated the source code of a openzim file format reader. These are
...
the raw format reader files with no integration in YaCy yet, which will
maybe follow as a next step. The zim file format is documented in
https://openzim.org and the reader code was taken from the archived,
non-maintained repository at https://github.com/openzim/zimreader-java
2023-10-27 10:59:06 +02:00
Michael Peter Christen
4308aa5415
removed concept of empty passwords as "no passwords used",
...
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
2023-10-25 22:56:06 +02:00
Michael Peter Christen
2c60ff14bb
fixed default pw comparison
2023-10-25 13:59:02 +02:00
Michael Peter Christen
4da320bebf
added a warning message in ConfigBasic in case that the default password
...
was not changed.
2023-10-24 23:36:26 +02:00
Michael Peter Christen
7830268be1
fix 756c817b5a
...
must be applied to all code where a transaction token is generated.
2023-10-21 13:00:49 +02:00
Michael Peter Christen
756c817b5a
fix for https://github.com/yacy/yacy_search_server/issues/544
2023-10-21 11:45:26 +02:00
Michael Peter Christen
03bf259601
fix for https://github.com/yacy/yacy_search_server/issues/363
...
We still need to set the load in the process because a demand for higher
crawl speed may require to increase the maximum load limit. However,
following the criticism in the bug, we do never reduce the load limit
again.
2023-10-16 18:26:47 +02:00
mchristen
8fc51f66c6
fixed a test class which prevented compilation on latest jvm
2023-09-26 15:39:34 +02:00
Joel Strasser
53bafa1544
consistent formatting in string concatenation
2023-09-25 23:31:55 +02:00
Joel Strasser
22c4188001
additionally match release stub for YaCy version
2023-09-25 22:41:04 +02:00
Michael Peter Christen
ff8fe7b6a4
fix for ',' or '.' appearing within a word or number. This will not
...
tokenize the query into parts around that character to make it possible
to search for numbers or version numbers.
2023-09-03 11:37:25 +02:00
Michael Peter Christen
0689f4f0ae
Check if the character is a minus sign and is followed by a letter or a
...
digit. Treat it as part of the word/number.
2023-09-03 10:22:03 +02:00
Michael Peter Christen
5db97a8928
parser can now separate numbers from words also when they are not
...
separated by space, i.e. 4.7Ohm
2023-09-02 19:15:22 +02:00
Michael Peter Christen
e3797de7de
enhanced the word tokenizer to recognize numbers in a proper way
2023-09-01 20:10:08 +02:00
Michael Peter Christen
88cd17ea57
migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon.
2023-09-01 18:24:52 +02:00
Michael Peter Christen
0089f234f4
added npe protection
2023-09-01 12:18:47 +02:00
Michael Peter Christen
8285fe715a
tab to spaces for classes supporting the condenser.
...
This is a preparation step to make changes in condenser and parser more
visible; no functional changes so far.
2023-09-01 11:00:42 +02:00
Michael Peter Christen
195bd2e444
extended the maximum header size to 16k to prevent http error 431
2023-08-19 15:21:24 +02:00
Michael Peter Christen
92dad3ed49
removed 7Zip parser because the old library could not be replaced by a maven repository
2023-07-27 23:11:27 +02:00
Michael Peter Christen
5afcba162b
updated libraries
2023-07-27 22:55:46 +02:00
Michael Christen
a348146d8f
setting connect host to 0.0.0.0
2023-06-29 10:46:05 +02:00
Michael Peter Christen
1c0f50985c
fixed documentation and some details of handling of keywords
2023-04-04 12:41:12 +02:00
Michael Christen
3472bcb4d3
patched a 'java.lang.NoSuchMethodError: com.twelvemonkeys.imageio.util.IIOUtil.lookupProviderByName' problem which occurred only on ARM
2023-03-05 01:17:28 +01:00
Michael Christen
f7b6e98ed7
Merge pull request #562 from thkoch2001/fix-warnings
...
Fix warnings
2023-03-05 00:56:04 +01:00
Michael Peter Christen
a157d01bb5
increased network image size limit for linuxtage poster
2023-02-24 17:50:29 +01:00
Thomas Koch
6bca836f49
fix 3 javac warnings: redundant cast
...
see GitHub issue #561 for context
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigAccounts_p.java:85: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer)sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:156: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:167: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
2023-02-11 17:17:46 +02:00
Michael Christen
9012fe4519
extended error message
2023-01-23 09:08:25 +01:00
Michael Christen
74104ff2d3
fix to timeout
2023-01-20 20:22:14 +01:00
Michael Peter Christen
9fcd8f1bda
added canonical filter
...
attention: this is on by default!
(it should do the right thing)
2023-01-16 14:50:30 +01:00
Michael Peter Christen
5a52b01c09
front-end integration of tag valency
2023-01-15 20:13:45 +01:00
Michael Peter Christen
7f728bb4b4
crawl profile storage extension for tag valency
2023-01-15 14:11:32 +01:00
Michael Christen
4304e07e6f
crawl profile adoption to new tag valency attribute
2023-01-15 01:20:12 +01:00
Michael Peter Christen
5acd98f4da
introduction of tag-to-indexing relation TagValency
2023-01-13 17:20:18 +01:00
Michael Peter Christen
ab3ef87abf
fixed exec start command where a path contains spaces
2022-12-05 17:30:11 +01:00
Michael Peter Christen
17eec667fb
better release number representation
2022-12-05 14:46:58 +01:00
Michael Peter Christen
b1199e97f8
enabling new update location release.yacy.net
...
with new version numbers
2022-12-05 14:26:17 +01:00
Michael Peter Christen
66169d1aad
default build properties to remove barrier developing in IDE
...
environments
2022-12-05 12:28:36 +01:00
Michael Peter Christen
309adb814e
fixed import of jsonlist imort from searchlab.eu using a direct URL
2022-10-25 00:51:53 +02:00
Michael Peter Christen
5ddc794bb9
code cleanup in http clieant
2022-10-24 23:34:39 +02:00
Michael Peter Christen
62d177bf59
stub for jsonlist index importer web page
2022-10-23 12:22:31 +02:00
Michael Peter Christen
efa0425f00
refactoring: moved jsonlist importer to importer class
2022-10-23 11:35:32 +02:00
Michael Peter Christen
49daa32a88
yacy can now read searchlab export dump files
...
using the surrogate input process:
- copy the searchlab export file to DATA/SURROGATE/in
- the file is processed automatically and then moved to
DATA/SURROGATE/OUT
2022-10-23 11:01:58 +02:00
Michael Peter Christen
6042dd99c6
reduced danger that Tray does not initialize
2022-10-06 00:01:42 +02:00
Michael Christen
61b27217b9
throttle number of DNS requests:
...
as soon as the number of requests is > 50, there is a forced delay
of (10 * (requests - 50)) milliseconds. That means that once the number
of DNS requests reach 150, there is a one second delay to each request.
This shall prevent that a remote DNS is flooded with request and
possibly gets damaged.
This is also a fix/enhancement for
https://github.com/yacy/yacy_search_server/issues/513
2022-10-05 22:59:09 +02:00
Michael Christen
99174282d8
try to shut down in a bit more ordered way
...
inspired by https://github.com/yacy/yacy_search_server/issues/518
2022-10-05 22:13:06 +02:00
Michael Peter Christen
482f507e65
upgraded solr from 8.8.1 to 8.9.0
...
should hopefully fix
https://github.com/yacy/yacy_search_server/issues/496
because it includes https://issues.apache.org/jira/browse/SOLR-13034
2022-10-05 17:24:07 +02:00
Michael Peter Christen
d49f937b98
added iso,apk,dmg to extension-deny list
...
see also https://github.com/yacy/yacy_search_server/issues/510
zip is not on the list because it can be parsed
2022-10-05 16:28:50 +02:00
Michael Peter Christen
761dbdf06d
increases log history length to 10000
...
implements https://github.com/yacy/yacy_search_server/issues/512
2022-10-05 16:09:28 +02:00
Michael Peter Christen
0970a79bbf
attempt to fix https://github.com/yacy/yacy_search_server/issues/517
2022-10-05 15:29:59 +02:00
Michael Peter Christen
1893661ee4
removed/suppressed more warnings
2022-10-05 14:38:59 +02:00
Michael Christen
51cf17d252
removed warnings
2022-10-04 22:28:15 +02:00
Michael Christen
867f96a32b
removed warnings
2022-10-04 22:05:32 +02:00
Michael Christen
8a06beaf24
removed finalize() methods, deprecated
2022-10-04 20:12:47 +02:00
Michael Peter Christen
60c9986a0e
new release file names with date and git hash
...
...without reference to 9000ish SVN
2022-10-04 15:31:47 +02:00
Michael Christen
8b37a5dc6f
removed log4j properties because we don't have a log4j any more
2022-10-03 10:44:03 +02:00
Michael Christen
347b676b76
changed system to load build properties
2022-10-03 10:12:47 +02:00
Michael Christen
c36bdbf78d
refactoring
2022-10-03 09:37:16 +02:00
Michael Peter Christen
1e1107c97c
clean-up and new servlet method caching
2022-10-02 23:39:00 +02:00
Michael Peter Christen
adbda4c71b
moved all remaining servlet classes to new location
2022-10-02 23:22:12 +02:00
Michael Peter Christen
33889b4501
moved more servlets to new location
2022-10-02 22:57:58 +02:00
Michael Peter Christen
6d388bb7bf
refactoring - moved htroot/yacy classes
2022-10-02 22:26:53 +02:00
Michael Peter Christen
48fcf3b3b5
alternative servlet method, tested with wiki
...
may become the future method to store servlets
2022-09-30 18:29:01 +02:00
Michael Peter Christen
d23dea2642
refactoring
2022-09-30 17:42:21 +02:00
Michael Peter Christen
23f1dc3741
addressing/fixing some concurrency issues from
...
https://github.com/yacy/yacy_search_server/issues/505
2022-09-30 08:01:13 +02:00
Michael Peter Christen
9c1bc533fa
removed hazelcast because it is phoning home, see also:
...
https://github.com/yacy/yacy_search_server/issues/504
2022-09-28 17:30:37 +02:00
Michael Peter Christen
fc98ca7a9c
removed ContentControl servlet and functinality
...
This was not used at all (as I know) and was blocking a smooth
integration of ivy in the context of an existing JSON parser.
2022-09-28 17:25:04 +02:00
Thomas Koch
3116713672
rm buildDate from build.xml and its usages
...
The https://reproducible-builds.org project invests a lot of work
to make builds reproducible. This is a security property. It allows
to compare the build of binaries from different builder machines.
If they are identical, it means that either the builds have not
been manipulated or an attacker managed to attack all builder
machines in exactly the same way.
One problem that the reproducible-builds project often sees is
that projects include the build time in their binaries. This
makes builds unreproducible for apparently no reason. The build
date should not be of interest since binaries built on different
dates but from the same source code should not be different.
Thus I decided to remove the build date instead of re-implementing
the functionality without the GitRev task. Anyways the reported
date was not the build date but the date of the last git commit
which is even less informative. The git commit ID would have
information value but should only be relevant for "nightly builds".
2022-07-10 11:32:38 +00:00
Thomas Koch
572558244a
rm unused build properties PKGMANAGER, RESTARTCMD, DESTDIR
...
PKGMANAGER is always false, thus the java code wrapped in
if statements for this property is dead code and can also
be removed.
The Debian packaging removed in c4659f0fb0
did set the PKGMANAGER property to true. When we do distro
packages again, we can revisit this commit and redo it with
property files instead.
RESTARTCMD is only used inside those dead code.
DESTDIR is never used even in the build.xml
2022-07-10 10:14:51 +00:00
Michael Peter Christen
3d138d3fdd
catch error when initializing hazelcast
...
should fix https://github.com/yacy/yacy_search_server/issues/468
2022-06-20 17:27:56 +02:00
Burkhard
a6a9828181
Merge pull request #440 from lfuelling/master
...
Add setting for public facing port
2022-02-11 08:09:17 +01:00
reger24
141e86964e
Fix compile deprecation warning
...
warning: [removal] AccessControlException in java.security has been deprecated and marked for removal
2022-02-11 00:27:55 +01:00
reger24
a7e93d9328
Add option to add host to default blacklist from search result
...
- added authorized ikon/button to blacklist a host
- host is added to default blacklist
- insired by https://github.com/yacy/yacy_search_server/issues/213#issuecomment-412485190
2022-02-09 19:42:04 +01:00
reger24
027e284ef9
Enhance notability of current blacklist by diff color in header
...
in servlet Blacklist_p.html
bugfix for 18dddb74c9
2022-02-06 09:43:59 +01:00
reger24
18dddb74c9
Harmonize loading/reading blacklist
...
between init and servlet to use the same procedures
-added BlacklistHelper.blacklistToSortedArray to simplify use in servlet
2022-02-06 00:10:55 +01:00
reger24
f28d705cd0
update IndexBroser_p add to blacklist button
...
add feedback to user on success
2022-02-03 03:25:13 +01:00
Michael Peter Christen
52fe2ed8ba
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2022-02-01 04:21:55 +01:00
Michael Peter Christen
39e7bbac13
removed deprecation warning for new Double()
2022-02-01 04:20:55 +01:00
reger24
6a5f0b3684
Servlet IndexBroser_p add button "Add to blacklist"
...
allows to add the displayed host to add to the default blacklist
2022-01-30 21:01:23 +01:00
Lukas Fülling
111cf48642
add missing prop
2022-01-29 19:28:57 +01:00
reger24
f33e0ed7fd
revert commit 17fd1a4616
...
wrong file selected
2022-01-29 12:18:07 +01:00
unknown
17fd1a4616
delete .idea not needed in distribution
...
.idea is created locally by IntelliJ IDEA upon import as gradle project to store IDEA specific settings.
No need to include in distribution
2022-01-29 10:45:37 +01:00
Daleth Darko
3ced06c731
Various javadoc fixes
2022-01-26 11:22:43 +01:00
reger24
6a1e259fd0
Fix NPE in Switchboard . getURL https://github.com/yacy/yacy_search_server/issues/441
2022-01-26 06:07:38 +01:00
reger24
eae16287e9
Added epub (ebook) format to existing zipParser
...
*.epub files are zip files containing xhtml files with content and other artifact files,
which the zipParser can already feed to index
- extension "epub"
- mime "epub+zip"
2022-01-24 13:51:27 +01:00
reger24
3e34f7c596
Import Ant build.xml into Gradle and use old compile of servlets in Gradle
...
to be able to use/reuse Ant targets where task has not been implemented in Gradle build.
- use the import to include the compile of htroot as first important task
! it is possible that first build fails an compile of GitRevTask.jar !
! solution/workaround -> use "ant all" once to compile GitRevTask.jar !
- adjusted build.xml a little
- split compile-core into compile-core and compile-htroot to have a target for htroot comp. only
- set build-path to reuse Gradles build directory
- (fix javadoc failure)
- changed the filtered-copy of yacyBuildProperties.java to ! the build path :-(
as current (copy,delete,exclude) is complicated and not migration worthy,
used simple/straigt forward approach (using a yacyBuildProperties.java.template file as copy source)
2022-01-18 20:00:55 +01:00
reger24
398b105781
Prevent that YaCy always starts with a exception message on none Apple systems
...
Perform try to access com.apple.eio.FileManager only on none Win systems
2022-01-18 13:02:12 +01:00
Lukas Fülling
e8a00007f6
add setting for public facing port
2022-01-11 17:10:48 +01:00
Michael Peter Christen
d7b17d8935
fixed missing thread name revert after balancer waiting
2021-12-22 01:46:18 +01:00
Michael Peter Christen
bd3f2483a1
replaced url and date retrieval by only url retrieval
...
This should prevent that the search index is used for freshnes of the
index entry.
2021-12-20 16:23:05 +01:00
Michael Peter Christen
163ba26d90
replaced check for load time method
...
instead of loading the solr document, an index only for the last loading
time was created. This prevents that solr has to fetch from its index
while the index is created. Excessive re-loading of documents while
indexing has shown to produce deadlocks, so this should now be
prevented.
2021-12-20 03:47:56 +01:00
Michael Peter Christen
1ead7b85b5
remove compiler warning
...
"warning: [try] explicit call to close() on an auto-closeable resource"
2021-12-13 12:28:34 +01:00
Michael Peter Christen
59777010dc
Merge branch 'master' of git@github.com:yacy/yacy_search_server.git
2021-11-18 00:49:56 +01:00
Michael Peter Christen
7898815c41
disabling concurrent logging
...
(maybe temporary)
2021-11-18 00:49:46 +01:00
sgaebel
4bf6954474
uses clientBuilder not HttpClients.custom() to have these inside the
...
Pool too
2021-10-31 23:06:33 +01:00
sgaebel
cdf901270c
always use HTTPClient by 'try with resources' pattern to free up
...
resources
2021-10-31 23:06:23 +01:00