okybaca
cba84632ee
UI: added a more descriptive message, CitationRank instead of cr
2023-11-14 00:05:23 +01:00
Michael Peter Christen
24011dcbcc
more file name extensions for json list surrogate files
2023-11-06 22:44:18 +01:00
Michael Peter Christen
34a9fc1a07
bugfixes to zim reader:
2023-11-05 12:46:37 +01:00
Michael Peter Christen
7db0534d8a
Added a zim parser to the surrogate import option.
...
You can now import zim files into YaCy by simply moving them
to the DATA/SURROGATE/IN folder. They will be fetched and after
parsing moved to DATA/SURROGATE/OUT.
There are exceptions where the parser is not able to identify the
original URL of the documents in the zim file. In that case the file
is simply ignored.
This commit also carries an important fix to the pdf parser and an
increase of the maximum parsing speed to 60000 PPM which should make it
possible to index up to 1000 files in one second.
2023-11-05 02:16:40 +01:00
Michael Peter Christen
70e29937ef
added a check in zim importer which tests if import URLs actually exist
2023-11-04 19:07:50 +01:00
Michael Peter Christen
fdc6311dc7
added parsing rules for wikibooks and wikinews in zim reader
2023-11-02 00:27:24 +01:00
Michael Peter Christen
53b01dbf2e
Merge branch 'master' of https://github.com/yacy/yacy_search_server.git
2023-11-01 18:57:04 +01:00
Michael Peter Christen
1c0df28bfb
added a zim importer that can be used for surrogate imports.
...
Can not be used yet because it requires some security additions
to verify that the given urls actually work.
2023-11-01 18:48:40 +01:00
okybaca
4add1f6bc7
replaced all the links to legacy legacy wiki to legacy wiki
2023-10-29 13:12:24 +01:00
Michael Peter Christen
4a54b24703
fix for "negative seek offset" error during extension of heap files.
...
This would have always happend when a heap file exceeds 2GB.
should fix https://github.com/yacy/yacy_search_server/issues/372
2023-10-29 09:32:21 +01:00
Michael Peter Christen
5ba5fb5d23
upgraded pdfbox to 3.0.0
2023-10-27 12:05:24 +02:00
Michael Peter Christen
4308aa5415
removed concept of empty passwords as "no passwords used",
...
because we now start YaCy with a default password (yacy).
This has impact of all function that check the current state of
password-protection that included the empty password situation,
including the warnings to set a password in case that none is set (which
cannot be the case any more).
2023-10-25 22:56:06 +02:00
Michael Peter Christen
2c60ff14bb
fixed default pw comparison
2023-10-25 13:59:02 +02:00
Michael Peter Christen
4da320bebf
added a warning message in ConfigBasic in case that the default password
...
was not changed.
2023-10-24 23:36:26 +02:00
Michael Peter Christen
7830268be1
fix 756c817b5a
...
must be applied to all code where a transaction token is generated.
2023-10-21 13:00:49 +02:00
Michael Peter Christen
756c817b5a
fix for https://github.com/yacy/yacy_search_server/issues/544
2023-10-21 11:45:26 +02:00
Michael Peter Christen
03bf259601
fix for https://github.com/yacy/yacy_search_server/issues/363
...
We still need to set the load in the process because a demand for higher
crawl speed may require to increase the maximum load limit. However,
following the criticism in the bug, we do never reduce the load limit
again.
2023-10-16 18:26:47 +02:00
mchristen
8fc51f66c6
fixed a test class which prevented compilation on latest jvm
2023-09-26 15:39:34 +02:00
Joel Strasser
53bafa1544
consistent formatting in string concatenation
2023-09-25 23:31:55 +02:00
Joel Strasser
22c4188001
additionally match release stub for YaCy version
2023-09-25 22:41:04 +02:00
Michael Peter Christen
ff8fe7b6a4
fix for ',' or '.' appearing within a word or number. This will not
...
tokenize the query into parts around that character to make it possible
to search for numbers or version numbers.
2023-09-03 11:37:25 +02:00
Michael Peter Christen
0689f4f0ae
Check if the character is a minus sign and is followed by a letter or a
...
digit. Treat it as part of the word/number.
2023-09-03 10:22:03 +02:00
Michael Peter Christen
5db97a8928
parser can now separate numbers from words also when they are not
...
separated by space, i.e. 4.7Ohm
2023-09-02 19:15:22 +02:00
Michael Peter Christen
e3797de7de
enhanced the word tokenizer to recognize numbers in a proper way
2023-09-01 20:10:08 +02:00
Michael Peter Christen
88cd17ea57
migrated solr from 8.9.0 to 8.11.2; activated also migration script. A YaCy index with solr 8.9.0 will automatically be migrated to 8.11.2. This is a preparation step to migrate to 9.0.0 soon.
2023-09-01 18:24:52 +02:00
Michael Peter Christen
0089f234f4
added npe protection
2023-09-01 12:18:47 +02:00
Michael Peter Christen
8285fe715a
tab to spaces for classes supporting the condenser.
...
This is a preparation step to make changes in condenser and parser more
visible; no functional changes so far.
2023-09-01 11:00:42 +02:00
Michael Peter Christen
195bd2e444
extended the maximum header size to 16k to prevent http error 431
2023-08-19 15:21:24 +02:00
Michael Peter Christen
92dad3ed49
removed 7Zip parser because the old library could not be replaced by a maven repository
2023-07-27 23:11:27 +02:00
Michael Peter Christen
5afcba162b
updated libraries
2023-07-27 22:55:46 +02:00
Michael Christen
a348146d8f
setting connect host to 0.0.0.0
2023-06-29 10:46:05 +02:00
Michael Peter Christen
1c0f50985c
fixed documentation and some details of handling of keywords
2023-04-04 12:41:12 +02:00
Michael Christen
3472bcb4d3
patched a 'java.lang.NoSuchMethodError: com.twelvemonkeys.imageio.util.IIOUtil.lookupProviderByName' problem which occurred only on ARM
2023-03-05 01:17:28 +01:00
Michael Christen
f7b6e98ed7
Merge pull request #562 from thkoch2001/fix-warnings
...
Fix warnings
2023-03-05 00:56:04 +01:00
Michael Peter Christen
a157d01bb5
increased network image size limit for linuxtage poster
2023-02-24 17:50:29 +01:00
Thomas Koch
6bca836f49
fix 3 javac warnings: redundant cast
...
see GitHub issue #561 for context
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigAccounts_p.java:85: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer)sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:156: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
[javac] ^
[javac] /home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigUser_p.java:167: warning: [cast] redundant cast to YaCyHttpServer
[javac] final YaCyHttpServer jhttpserver = (YaCyHttpServer) sb.getHttpServer();
2023-02-11 17:17:46 +02:00
Michael Christen
9012fe4519
extended error message
2023-01-23 09:08:25 +01:00
Michael Christen
74104ff2d3
fix to timeout
2023-01-20 20:22:14 +01:00
Michael Peter Christen
9fcd8f1bda
added canonical filter
...
attention: this is on by default!
(it should do the right thing)
2023-01-16 14:50:30 +01:00
Michael Peter Christen
5a52b01c09
front-end integration of tag valency
2023-01-15 20:13:45 +01:00
Michael Peter Christen
7f728bb4b4
crawl profile storage extension for tag valency
2023-01-15 14:11:32 +01:00
Michael Christen
4304e07e6f
crawl profile adoption to new tag valency attribute
2023-01-15 01:20:12 +01:00
Michael Peter Christen
5acd98f4da
introduction of tag-to-indexing relation TagValency
2023-01-13 17:20:18 +01:00
Michael Peter Christen
ab3ef87abf
fixed exec start command where a path contains spaces
2022-12-05 17:30:11 +01:00
Michael Peter Christen
17eec667fb
better release number representation
2022-12-05 14:46:58 +01:00
Michael Peter Christen
b1199e97f8
enabling new update location release.yacy.net
...
with new version numbers
2022-12-05 14:26:17 +01:00
Michael Peter Christen
66169d1aad
default build properties to remove barrier developing in IDE
...
environments
2022-12-05 12:28:36 +01:00
Michael Peter Christen
309adb814e
fixed import of jsonlist imort from searchlab.eu using a direct URL
2022-10-25 00:51:53 +02:00
Michael Peter Christen
5ddc794bb9
code cleanup in http clieant
2022-10-24 23:34:39 +02:00
Michael Peter Christen
62d177bf59
stub for jsonlist index importer web page
2022-10-23 12:22:31 +02:00