Commit Graph

244 Commits

Author SHA1 Message Date
Michael Peter Christen
910a496c9f replaced http links with https
Some checks failed
CI Script to build on self-hosted server / build (push) Has been cancelled
2024-07-21 18:02:58 +02:00
Michael Peter Christen
4a54b24703 fix for "negative seek offset" error during extension of heap files.
This would have always happend when a heap file exceeds 2GB.
should fix https://github.com/yacy/yacy_search_server/issues/372
2023-10-29 09:32:21 +01:00
Michael Christen
8a06beaf24 removed finalize() methods, deprecated 2022-10-04 20:12:47 +02:00
Daleth Darko
3ced06c731 Various javadoc fixes 2022-01-26 11:22:43 +01:00
Michael Peter Christen
63ad8ce6b2 removed ymarks
had not been used since a long time
2021-09-16 22:23:51 +02:00
Michael Peter Christen
e9c5e78868 replaced new Number(Number) with Number.instanceOf
to remove deprecation warnings for Java 9
2021-08-08 00:39:03 +02:00
luccioman
2bdd71de60 Added server side columns sorting on the Process Scheduler table
For easier usage of large tables in the Table_API_p.html page.
2018-07-04 10:28:32 +02:00
luccioman
bb51555830 Removed remaining unsafe accesses to SimpleDateFormat instances.
SimpleDateFormat must not be used by concurrent threads without
synchronization for parsing or formating dates as it is not thread-safe
(internally holds a calendar instance that is not synchronized).

Prefer now DateTimeFormatter when possible as it is thread-safe without
concurrent access performance bottleneck (does not internally use
synchronization locks).
2018-07-02 10:00:40 +02:00
luccioman
e97580dfc7 Fixed unsafe conccurent access to generic SimpleDateFormat instances
SimpleDateFormat must not be used by concurrent threads without
synchronization for parsing or formating dates as it is not thread-safe
(internally holds a calendar instance that is not synchronized).

Prefer now DateTimeFormatter when possible as it is thread-safe without
concurrent access performance bottleneck (does not internally use
synchronization locks).
2018-06-28 14:59:23 +02:00
luccioman
fa4399d5d2 Small perf improvement : initialize threads names early when possible
Initializing Thread names using the Thread constructor parameter is
faster as it already sets a thread name even if no customized one is
given, while an additional call to the Thread.setName() function
internally do synchronized access, eventually runs access check on the
security manager and performs a native call.

Profiling a running YaCy server revealed that the total processing time
spent on Thread.setName() for a typical p2p search was in the range of
seconds.
2018-05-23 14:45:35 +02:00
luccioman
6425963cee Fixed internal tables exact value match iterator 2018-01-10 18:38:42 +01:00
luccioman
5fdd5d16b1 Use volatile to ensure concurrent threads use up to date property value 2017-06-15 09:48:22 +02:00
luccioman
28b451a0b3 Made Cache compression level and lock timeout user configurable 2017-06-14 19:02:08 +02:00
luccioman
a7394b479b Limit the synchronization blocking time on some Cache operations.
Using a Reentrant lock instead of the intrinsic synchronization lock
permits limiting the blocking time to acquire a lock.

Useful on a very busy Cache concurrently accessed by many threads : when
the time to acquire a lock is too high, getting/storing content on the
cache becomes inefficient, and it is then better to fall back to loading
remote resources.

Illustrated by the CacheTest stress test and some traces reported in
mantis 751 ( http://mantis.tokeek.de/view.php?id=751 )
2017-06-14 09:13:50 +02:00
luccioman
8399275142 Properly close file output streams even on exceptions scenarios. 2017-06-08 07:19:16 +02:00
luccioman
a04feac064 Ensure file input streams proper closing in both success and failures
Also add when possible a warning level log message on input stream
closing error instead of failing silently. This could help understanding
some IO exceptions such as "too many files open".
2017-06-03 04:00:46 +02:00
luccioman
eec5779889 Added a name prefix to pooled threads for easier monitoring.
Using JVM monitoring tools, it is then easier to identify tasks running
inside thread pool with a custom prefix rather than the generic one :
"pool-".
2016-11-23 11:21:14 +01:00
reger
49eae79c01 fix Tables.hasIndex check for tablename = key
apply same functionality to hasHeap (to not create new table on call hasHeap)
2016-11-09 02:33:42 +01:00
Michael Peter Christen
103a8348b3 fix for NPE and small performance enhancement 2016-08-10 06:48:08 +02:00
Michael Peter Christen
34de1e8cbc gzip compression will perform more efficient and with better compression
level
2015-06-01 01:24:33 +02:00
Michael Peter Christen
a1a8edfc0a wrap HeaReader close() in a catch Throwable block to prevent that an
excpetion during close blocks the whole shotdown process
2015-05-30 17:54:02 +02:00
Michael Peter Christen
fed26f33a8 enhanced timezone managament for indexed data:
to support the new time parser and search functions in YaCy a high
precision detection of date and time on the day is necessary. That
requires that the time zone of the document content and the time zone of
the user, doing a search, is detected. The time zone of the search
request is done automatically using the browsers time zone offset which
is delivered to the search request automatically and invisible to the
user. The time zone for the content of web pages cannot be detected
automatically and must be an attribute of crawl starts. The advanced
crawl start now provides an input field to set the time zone in minutes
as an offset number. All parsers must get a time zone offset passed, so
this required the change of the parser java api. A lot of other changes
had been made which corrects the wrong handling of dates in YaCy which
was to add a correction based on the time zone of the server. Now no
correction is added and all dates in YaCy are UTC/GMT time zone, a
normalized time zone for all peers.
2015-04-15 13:17:23 +02:00
Michael Peter Christen
321840fde3 Replaced all fixed thread pools with cached thread pools. The cached
thread pools will flush their cached (dead) threads after 60 seconds.
This will cause that YaCy now runs constantly withl about 50 threads,
about 100 at peak times. Previously, about 400 threads had been cached
and kept in a hibernation state, which caused that the numproc counter
in /proc/user_beancounters (exists only in VM-hosted linux) was as high
as the cached number of threads. This caused that VM supervisors
terminated whole VM sessions if a limit was reached. Many VM providers
have limits of numproc=96 which made it virtually impossible to run YaCy
on such machines. With this change, it will be possible to run many YaCy
instances even on VM hosts.
2014-12-02 16:26:07 +01:00
Michael Peter Christen
1db476c67e fix for bad table iteration 2014-11-10 18:52:01 +01:00
Michael Peter Christen
bc275dca07 added network history graph image /NetworkHistory.png which can show
many different statistics about the history of the peer.
2014-10-10 14:06:47 +02:00
orbiter
3ac31614a3 added option to reverse-sort YaCy tables (internal API change only) 2014-09-18 11:11:09 +02:00
reger
ea6c9e9b07 reduce mem buffer overhead for gap files during r/w
(they are typically small compared to idx allowing to use smaller buffersize -> set to 16k records)
2014-08-18 00:03:24 +02:00
Michael Peter Christen
501d55cd35 removed superfluous assert 2014-06-19 12:10:12 +02:00
orbiter
97983ba89f fixed generics warnings for generic array instantiation that appeared
after migration to Java 7
2014-05-20 21:50:16 +02:00
orbiter
a3542f29b4 npe fix 2014-04-25 09:26:20 +02:00
orbiter
c48d2a2a02 npe fix 2014-04-25 09:23:10 +02:00
Michael Peter Christen
5b83887da8 npe fix 2014-04-02 02:34:55 +02:00
Michael Peter Christen
8b44fcf0f4 added missing @Override annotation 2014-03-28 13:48:37 +01:00
Michael Peter Christen
fdaeac374a - enhanced postprocessing speed and memory footprint (by using HashMaps
instead of TreeMaps)
- enhanced memory footprint of database indexes (by introduction of
optimize calls)
- optimize calls shrink the amount of used memory for index sets if they
are not changed afterwards any more
2014-02-28 14:01:09 +01:00
Michael Peter Christen
25a6c05008 experimental removal of synchronization. This should work for all cases
where the size() and isEmpty() method is used only for statistics, which
happens at many locations in YaCy. If these methods are used for
structual reasons (like accessing the last element in an array) then it
may fail or cause other problems. As far as visible, this is not the
case.
2014-01-19 14:47:11 +01:00
Michael Peter Christen
5695280edd removed superfluous synchronization 2014-01-19 14:44:58 +01:00
Michael Peter Christen
c3dcbdc8d5 try to recover from an OOM during citation index reading and fail-over
to second solr core in case of unrecoverable OOM.
2013-11-28 01:10:25 +01:00
Michael Peter Christen
7b69c438f7 more methods for the table class 2013-10-15 16:46:59 +02:00
Michael Peter Christen
5e31bad711 - the webgraph shall store all links which appear on a web page and not
all unique links! This made it necessary, that a large portion of the
parser and link processing classes must be adopted to carry a different
type of link collection which carry a property attribute which are
attached to web anchors.
- introduction of a new URL class, AnchorURL
- the other url classes, DigestURI and MultiProtocolURI had been renamed
and refactored to fit into a new document package schema, document.id
- cleanup of net.yacy.cora.document package and refactoring
2013-09-15 00:30:23 +02:00
Michael Peter Christen
47b1c81d08 - refactoring
- generalized writing of url attributes to solr documents
- added more url attributes to error documents
2013-08-20 15:46:04 +02:00
Roland Haeder
13433d41a1 Log this exception better
Conflicts:
	source/net/yacy/kelondro/blob/Tables.java
2013-07-27 09:54:51 +02:00
Michael Peter Christen
aeac2fb763 replaced more containsKey() -> get() usages by a simple get(), followed
by a test for NULL. This should increase the application speed and
reduces the lookup time for the affected methods by 50%
2013-07-23 12:16:51 +02:00
Roland Haeder
841a28ae76 Added 'final' for all exception blocks as this helps the Java compiler
to optimize memory usage

Conflicts:
	source/net/yacy/search/Switchboard.java
2013-07-17 18:31:30 +02:00
Michael Peter Christen
5878c1d599 - refactoring of log to ConcurrentLog:
jdk-based logger tend to block
at java.util.logging.Logger.log(Logger.java:476) in concurrent
environments. This makes logging a main performance issue. To overcome
this problem, this is a add-on to jdk logging to put log entries on a
concurrent message queue and log the messages one by one using a
separate process.
- FTPClient uses the concurrent logging instead of the log4j logger
2013-07-09 14:28:25 +02:00
Michael Peter Christen
14186e815e npe fix 2013-06-13 22:42:21 +02:00
Michael Peter Christen
e20450e798 patch in HTCache and CitationIndex loading in case that a file is
broken: do not crash; instead ignore the file and delete it.
2013-06-07 12:52:03 +02:00
orbiter
a1c989002b fix for http://forum.yacy-websuche.de/viewtopic.php?f=5&t=4652
generate dht data even if dht receive and dht transmission is switched
off
2013-05-08 16:48:45 +02:00
orbiter
e1bfe9d07a - reduction of the concurrently running processes to make YaCy more
adjusted to smaller and 1-core devices.
- the workflow processor now starts no process at all. these are started
as soon as parser/condenser/indexing queues are filled.
- better abstraction
2013-04-25 11:33:17 +02:00
Michael Peter Christen
089dee1770 - generalized SchemaConfiguration into super-class Configuration and
adopted other classes which used the configuration-only access for that
class
- removed many warnings
- adjusted logging
2013-02-25 00:09:41 +01:00
orbiter
712cc37c40 if maxFileSize < 0 then the file size limit is without limit. 2012-12-10 21:17:45 +01:00